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Abstract 

In  modern  communication  systems,  bandwidth  is  a  limited  commodity.  Bandwidth 
efficient  systems  are  needed  to  meet  the  demands  of  the  ever-increasing  amount  of  data  that 
users  share.  Of  particular  interest  is  the  U.S.  Military,  where  high-resolution  pictures  and 
video  are  used  and  shared.  In  these  environments,  covert  communications  are  necessary 
while  still  providing  high  data  rates.  The  promise  of  multi-antenna  systems  providing 
higher  data  rates  has  been  shown  on  a  small  scale,  but  limitations  in  hardware  prevent  large 
systems  from  being  implemented. 

Discussed  here  are  the  effects  of  the  topology  of  communication  nodes  on  Inter-Block 
Interference  in  Orthogonal  Frequency  Division  Multiplexing  (OFDM)  systems.  This  effect 
can  be  leveraged  such  that  eavesdroppers  experience  a  lower  Signal  to  Interference  plus 
Noise  Ratio  (SINK)  resulting  in  a  poor  quality  communication  link.  Simulations  show 
that  an  eavesdropper  has  a  10  dB  worse  SINK.  The  reverse  is  also  considered  where  the 
point  of  view  is  taken  as  the  eavesdropper.  A  study  into  improving  the  eavesdropping 
communication  link  performed.  A  pivotal  calculation  for  the  eavesdropper  is  found  to  be 
the  estimation  of  the  time  of  arrival  of  the  received  waveforms.  The  relative  delays  between 
users’  waveforms  is  used  to  reduce  the  interference  at  the  eavesdropper.  The  van  de  Beek 
and  Acharya  methods  are  considered.  Simulations  and  experiments  show  that  the  Acharya 
method  provides  a  more  accurate  measurement.  Also  discussed  are  hardware  limitations 
such  as  on  board  slice  logic  and  Digital  Signal  Processing  (DSP)  resources  blocks.  The 
utilization  of  these  logic  blocks  proves  to  be  a  limiting  factor  in  large  scale  multi-antenna 
systems.  Particularly  the  inversion  and  equalization  processes  are  the  most  expensive  in 
terms  of  computation  time  and  hardware  resources.  The  trade-off  between  data  rate  and 
resource  usage  is  provided  with  comments  on  interfacing  multiple  FPGAs  to  provide  more 
available  resources. 
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SCALABLE  SYSTEM  DESIGN  EOR  COVERT  MIMO  COMMUNICATIONS 


I.  Motivation 

Within  the  last  20  years  major  advances  in  Multiple-Input  Multiple-Output  (MIMO) 
communication  technology  has  occurred,  from  Telatar  [1]  and  Eoschini’s  [2]  ground 
breaking  work  to  Alomouti  codes  and  Space-Time  Block  Codes  (STBC)  which  provide 
a  more  reliable  data  link  [3].  In  more  recent  years  work  has  been  aimed  at  the  effects  of  the 
physical  limitations  of  antenna  spacing  experienced  in  cellular  communication  technology 
[4]. 

The  attractive  attribute  of  MIMO  communication  techniques  is  the  improved 
bandwidth  efficiency.  Today  bandwidth  is  a  limited  commodity,  being  more  efficient  with 
the  limited  resources  is  a  must  and  MIMO  communications  provide  that  feature.  As  the 
amount  of  digital  data  around  the  world  increases,  the  need  to  share  that  data  with  the  rest 
of  the  world  rises  as  well.  Once  again,  to  do  this  bandwidth  is  needed  to  provide  the  desired 
data  rates. 

The  United  States  (US)  Military  has  a  particular  interest  in  MIMO  communications, 
where  communication  between  Tactical  Operation  Centers  (TOC),  ground  troops,  air 
support,  and  heavy  artillery  is  needed.  Unmaned  Aerial  Vehicles  (UAVs)  are  taking  high 
resolution  pictures  and  video  from  all  around  the  world.  The  environments  in  which  this 
data  is  taken  are  often  hostile.  In  these  situations  covert  communications  are  needed  to 
provide  the  bandwidth  needed  to  get  the  pictures,  video,  voice  data,  and  intelligence  to  the 
right  people. 

In  the  literature,  [31]  looks  at  utilizing  MIMO  spatial  gains  to  reduce  transmission 
power,  thus  providing  a  Eow  Probability  of  Detection  (EPD)  waveform  and  [32]  utilizes 
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Precoding  Matrix  Index  (PMI)  secret  keys  to  provide  secure  communications.  It  has 
been  found  that  typical  encryption  algorithms  are  susceptible  to  bit  errors  which  in  turn 
reduces  throughput  of  the  system  [33].  Physical  layer  methods,  in  some  instances,  utilize 
mechanisms  that  do  not  experience  this  phenomena  thus  preserving  throughput  [34].  This 
is  advantageous  especially  when  these  physical  layer  techniques  are  used  in  tandem  with 
traditional  encryption  making  the  system  more  robust  to  security  threats. 

Often  in  these  systems  multiple  users  are  sharing  resources.  A  good  example  of  this  is 
cellular  communications.  In  this  example,  the  shared  resource  is  bandwidth  where  users  are 
allocated  bandwidth  dynamically  as  needed  per  user.  An  adverse  result  of  this  bandwidth 
sharing  is  the  adverse  affect  in  timing  recovery  of  the  received  signal,  which  reduces  the 
fidelity  of  the  communication  link.  As  the  number  of  users  increases,  the  higher  the  data 
rate  requirements  become  and  because  of  this  added  requirement  dynamic  allocation  is 
used  to  reuse  resources.  However,  when  this  occurs  non-contiguous  bands  of  spectrum  are 
allocated  to  users  which  also  reduces  timing  recovery  ability. 

In  recent  years  Orthogonal  Frequency  Division  Multiplexing  (OFDM),  and  its 
variations,  have  become  increasingly  popular.  In  4G  communications,  satellite  radio 
and  Wireless  Local  Area  Network  (WLAN)  OFDM  has  been  utilized  for  its  multi-path 
resistance.  OFDM  relies  on  the  Cyclic  Prefix  (CP)  to  transform  linear  convolution  to 
circular  convolution,  however  this  adds  structure  that  can  be  exploited  by  an  Eavesdropping 
Receiver  (Ex).  The  Ex  can  simply  use  correlation  to  determine  where  symbol  boundaries 
occur  and  demodulate  the  payload.  Some  work  has  been  done  to  reduce  this  structure  by 
including  random  data  in  random  OEDM  blocks  [35].  A  transmitted  signal  with  varying 
OEDM  block  lengths  makes  it  harder  to  gain  an  accurate  timing  estimate. 

Due  to  OEDM’s  vulnerability  to  phase  errors,  timing  estimates  are  required  to  be 
accurate.  Eor  this  reason,  jamming  techniques  for  OEDM  often  focus  on  disrupting  the 
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receiver’s  ability  to  gain  accurate  symbol  timing  [36].  In  this  particular  instance  the  Ex  is 
also  actively  disrupting  the  signal  for  the  intended  receiver  (Rx). 

Another  potential  source  of  interference  in  OFDM  systems  is  caused  by  the  channel 
impulse  response  itself,  if  the  CP  is  shorter  than  the  impulse  response  [18].  This  common 
problem  in  OFDM  can  be  exacerbated  by  the  topology  used  in  the  system  [37].  If  there 
is  a  difference  in  arrival  times  between  multiple  sources,  the  delay  effectively  adds  to  the 
length  of  the  impulse  response.  If  this  delay  is  known,  the  CP  length  can  accommodate  the 
longer  impulse  response  or  synchronization  techniques  can  be  used  to  avoid  delay  related 
interferences.  If  the  delay  can  not  be  characterized,  or  the  channel  itself  is  longer  than 
the  cyclic  prefix,  Inter-Block  Interference  (IB I)  is  introduced  resulting  in  a  lower  Signal  to 
Interference  plus  Noise  Ratio  (SINR). 

A  concept  of  Inter-Symbol  Interference  (ISI)  to  degrade  a  non-cooperative  receiver’s 
performance  is  used  in  [38].  The  transmitted  signal  is  preconditioned  with  columns  of 
the  Singular  Value  Decomposition  (SVD)  of  the  channel  convolution  matrix  in  such  a  way 
that  Rx  can  demodulate  the  signal  but  the  Ex  experiences  a  coded  signal  distorted  by  the 
wireless  channel. 

Capacity  gains  over  a  Single-Input  Single-Output  (SISO)  system  have  been  shown 
for  two  transmitter  two  receiver  systems,  denoted  2x2  and  also  4x4  systems  [5-7]. 
Theoretically,  larger  systems  offer  higher  gains,  however  the  hardware  technology  limits 
fully  functional  large  MIMO  systems.  Research  into  larger  systems  provide  operational 
techniques  needed  for  higher  capacity  gains  in  realizable  systems. 

The  computational  complexity  of  the  MIMO  receiver  pushes  the  boundaries  of 
modern  processing  platforms  [28].  This  is  apparent  in  the  literature  where  the  approach 
taken  by  researchers  to  reduce  computational  complexity  is  done  by  focusing  on  the 
complex  subprocesses  of  the  receiver  algorithm.  For  example,  the  architecture  of  a  Single 
Instruction  Multiple  Data  (SIMD)  co-processor  is  designed  for  symbol  recovery  in  a  Field 
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Programmable  Gate  Array  (FPGA)  [42] .  The  design  of  this  co-processor  hopes  to  balance 
resource  usage  and  latency.  By  reducing  the  latency  of  the  co-processor  the  data  rate  is 
potentially  increased. 

Some  literature  assumes  a  realistic  data  rate  for  the  receiver  to  service.  Under  the 
assumption  of  burst  mode  communication,  the  receiver  has  a  maximum  amount  of  time  to 
process  a  data  block  [43].  To  process  the  data  block  quickly,  custom  hardware  is  interfaced 
with  a  processor,  such  as  the  MicroBlaze,  to  reduce  latency  [44] . 

Reducing  latency  and  FPGA  resources  are  not  the  only  constraints.  An  optimal  or  near 
optimal  Bit  Error  Rate  (BER)  is  desired.  Investigation  into  an  EPGA  implementation  of  the 
K-Best  and  Trellis-search  algorithms  show  near  optimal  equalization  and  bit  mapping  with 
cited  resources  used  in  the  algorithm,  [45]  and  [46]  respectively. 

A  MIMO  Square  Root  Decoder  reduces  the  complexity  of  the  pseudo-inverse 
calculation  while  maintaining  a  constant  BER  [47],  while  [32]  focuses  on  computing  the 
inversion  by  using  some  approximations.  The  exact  inversion  and  approximate  inversion 
are  weighed  in  [48]  and  they  show  that  the  number  of  antennas  at  the  base  station  is  the 
gauge  by  which  to  determine  the  best  algorithm  for  a  Very  Earge  Scale  Integration  (VESI) 
implementation. 

The  rest  of  this  document  is  organized  as  follows.  In  Chapter  II,  background 
information  is  covered  starting  with  a  standard  notation  set  for  the  rest  of  the  document. 
Next,  multi-carrier  waveforms  are  discussed,  highlighting  OEDM  and  Orthogonal 
Erequency  Division  Multiple  Access  (OEDMA).  Matrix  decomposition  algorithms  are 
introduced  such  as  the  EU  and  QR  decompositions,  which  are  tools  used  to  handle  the 
inversion  of  the  channel  matrix  in  hardware.  The  second  chapter  concludes  with  an 
overview  the  EPGA. 

Next,  Chapter  III  discusses  the  first  of  three  MIMO  specific  problems.  The  first  of 
these  considers  the  delays  between  the  transmitted  waveforms  of  randomly  distributed 
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transmitters  in  a  circular  region  of  activity.  The  topologically  induced  delay  induces  IBI, 
Inter-Carrier  Interference  (ICI),  and  ISI  in  multi-carrier  systems.  The  performance  of  a 
receiver  suffering  from  the  effects  of  IBI,  ICI,  and  ISI  are  considered. 

Then,  Chapter  IV  further  builds  on  the  topology  and  system  described  in  Chapter  III. 
The  performance  of  the  eavesdropping  receivers  would  increase  if  timing  estimates  were 
calculated  and  used  in  the  equalization  process.  The  view  is  taken  from  the  eavesdropper  to 
improve  performance  where  by  estimating  timing  delays  the  degrading  effect  of  the  delays 
can  be  reduced.  An  implementation  of  the  algorithms  is  discussed  and  real-world  results 
are  provided. 

Chapter  V  then  moves  to  the  perspective  of  the  intended  receiver.  Since  data  rate  is 
the  primary  concern  and  considering  the  lessons  learned  in  Chapter  III  a  MIMO  receiver 
is  developed  in  VHSIC  Hardware  Description  Language  (VHDL).  Resource  utilization  is 
reported  and  the  tradeoffs  between  data  rate  and  resource  usage  is  weighed.  Furthermore, 
trends  for  resource  usage  as  a  function  of  the  number  of  antennas  used  is  extrapolated  to 
get  some  intuition  for  the  size  of  implementation  needed  for  a  desired  data  rate. 

Chapter  VI  concludes  the  dissertation  with  a  summary  of  the  three  contributions  and 
provides  final  thoughts  on  future  work. 
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II.  Background  MIMO  Theory 


This  chapter  outlines  background  information  about  MIMO  communications.  First  in 
Section  2.1,  an  overview  the  notation  used  in  this  dissertation  is  provided.  In  Section  2.2, 
multi-carrier  waveforms  are  discussed.  Section  2.3  provides  the  system  model  and  topology 
of  transmitters  and  receivers  used  throughout  this  dissertation.  Taking  the  point  of  view  of 
an  eavesdropper.  Section  2.4  discusses  the  ability  to  estimate  arrival  times  in  a  multiple 
user  scenario.  This  provides  the  ability  to  equalize  the  non-cooperative  users’  waveforms 
with  less  interference  but  also  provides  the  basis  for  Time  Difference  of  Arrival  (TDOA) 
positioning  efforts.  Finally,  Section  2.5  provides  an  overview  of  FPGAs  in  general  and 
also  the  Wireless  open-Access  Research  Platform  (WARP)  boards  specifically  used  in  this 
dissertation. 

2.1  Notation 

To  analyze  MIMO  systems,  some  notation  is  needed  to  manage  the  waveforms  being 
transmitted  and  received. 

2.1.1  Vectors  and  Matrices. 

Vectors  and  matrices  are  used  quite  often  in  analyzing  MIMO  systems.  Column 
vectors  are  denoted  by  bold  face  lower  case  letters  such  as  x  6  C^'.  The  use  of  the 
transpose,  ,  or  conjugate  transpose,  ()^,  denotes  row  vectors.  In  MIMO  specific  vectors, 
the  transmit  vector  x  can  be  defined  as  a  vector  of  symbols  across  the  transmit  antennas  as 
a  function  of  time  or  x  can  be  defined  as  a  vector  of  symbols  that  are  transmitted  on  a  single 
antenna.  These  two  definitions  are  used  on  a  case  by  case  basis  and  x  is  defined  explicitly 
to  avoid  any  ambiguity. 
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Matrices  are  denoted  by  bold  faee  upper  ease  letters  sueh  as  H  6  In  MIMO 

systems  H  denotes  the  ehannel  matrix.  Estimates  of  any  variable  are  denoted  by  the  same 
letter  of  the  variable  with  a  hat,  for  example  a  estimate  of  the  ehannel  matrix  is  denoted,  H. 

2.1.2  MIMO  Specific  Variables. 

Table  2.1  eontains  a  list  of  all  the  MIMO  speeifie  variables  used  in  this  dissertation. 
A  frequeney  domain  veetor  is  distinguished  from  a  time  domain  veetor  by  the  tilde  such 
as  h  is  the  frequeney  response  where  the  impulse  response  is  h.  The  remaining  variables 
in  Table  2.1  represent  general  system  parameters  sueh  as  the  number  of  antennas  at  the 
transmitter  and  reeeiver. 


Table  2. 1 :  MIMO  speeifie  variables 


Variable 

Deseription 

Variable 

Deseription 

H 

ehannel  matrix 

h 

impulse  response 

h 

frequeney  response 

O 

eovarianee  of  transmit  waveform 

Nt 

number  of  transmitters 

Nr 

number  of  reeeivers 

N, 

number  of  samples 

Np 

length  of  preamble  <  Np 

B 

bandwidth 

M 

modulation  order 

Pt 

total  transmit  power 

L 

impulse  response  length 

Npii 

number  of  pilot  tones 

N 

number  of  sub-earriers 

Ndt 

number  of  data  tones 

k 

veetor  of  sub-oarrier  indiees 

kp,7 

veetor  of  pilot  tone  indiees 

Na 

Na  =  Nr  =  Nt  in  Ch.  5 

2.2  Multi-Carrier  Waveforms 

Channel  equalization  in  single-earrier  waveforms  involves  the  eomputationally 
intensive  proeess  of  deeonvolution,  whieh  grows  more  expensive  as  the  impulse  response 
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gets  larger.  For  this  reason,  multi-carrier  waveforms  are  introduced  in  this  section. 
Defining  the  information  carrying  symbols  in  the  frequency  domain,  the  convolution 
process  becomes  a  multiplication  process,  for  which  the  computational  complexity  of 
division  is  a  better  trade  off  than  that  of  deconvolution. 

2.2.1  OFDM  Waveforms. 

As  MIMO  communication  systems  gain  popularity  because  of  their  higher  bandwidth 
efficiency,  they  are  replacing  SISO  communication  systems.  When  replacing  SISO 
systems,  the  MIMO  system  inherits  a  large  bandwidth  with  which  to  operate  [8].  As  with 
any  large  bandwidth  system,  multi-path  effects  are  more  prevalent.  As  a  result,  multi-carrier 
waveforms,  such  as  OFDM,  are  more  attractive  because  of  the  multi-path  resistance  of  the 
waveform.  Two  mechanisms  are  utilized  by  OFDM  to  provide  resistance  to  multi-path: 
frequency  domain  pilot  assisted  channel  estimation,  and  a  CP  also  referred  to  as  guard 
time.  Details  of  these  mechanisms  are  described  in  Section  2.2. 1.1  and  Section  2.2. 1.2, 
respectively. 

Unfortunately,  a  drawback  to  OFDM  is  sensitivity  to  synchronization  error.  A  popular 
method  for  synchronizing  OFDM  signals  is  exploiting  the  CP  structure  of  the  OFDM 
waveform.  Correlation  is  used  to  determine  when  the  samples  repeat,  corresponding 
to  the  prepended  CP  samples  and  the  end  of  the  OFDM  symbol.  Errors  occur  in  this 
estimation  technique  when  the  Signal  to  Noise  Ratio  (SNR)  is  low.  Another  contributing 
factor  for  synchronization  error  is  the  impulse  response.  If  the  impulse  response  has  a 
dominant  channel  tap  at  a  positive  delay  i*  0  resulting  from  a  Non  Line-of-Sight  (NLOS) 
waveform,  the  energy  of  the  signal  is  delayed  by  £* .  Correcting  for  this  effect  utilizes  the 
channel  estimate.  Using  correlation  to  exploit  the  CP  structure  and  determining  the  TOR  is 
discussed  in  Section  2.2. 1.2. 
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2.2.1. 1  Optimal  Pilot  Schemes. 

Pilot  assisted  channel  estimation  assigns  pilot  tones  in  the  frequency  domain  with  the 
purpose  of  estimating  the  frequency  response  at  the  receiver.  The  number  of  pilot  tones, 
the  amount  of  power  each  tone  is  allotted  from  the  fixed  power  budget  and  where  the 
pilots  are  placed  are  design  parameters.  How  these  design  parameters  affect  the  channel 
estimate  accuracy,  throughput,  and  capacity  is  considered  in  this  section.  If  only  channel 
estimation  accuracy  is  considered,  all  the  sub-carriers  should  be  pilot  tones.  This  results 
in  an  accurate  channel  estimate  but  no  data  is  transmitted.  However,  a  channel  estimate 
is  needed  to  accurately  receive  data  so  using  only  data  sub-carriers  would  result  in  a  low 
fidelity  channel.  The  goal  of  this  section  is  to  determine  a  balance  between  these  criteria 
and  maximize  capacity. 

It  is  shown  in  [9,  10]  that  the  optimal  number  of  pilots  with  respect  to  capacity  is  L. 
The  frequency  response  is  the  Fourier  Transform  of  the  impulse  response.  The  impulse 
response  is  L  samples  in  duration  and  the  N  resulting  frequency  response  samples  only 
have  L  degrees  of  freedom.  These  L  pilots  are  also  equally  spaced  and  the  power  allocated 
to  a  single  pilot  tone  is  the  total  power  allocated  to  pilot  tones  divided  by  L.  This  results  in 
L  equally  spaced  and  equally  powered  pilot  tones. 

Each  pilot  tone  is  allocated  the  same  amount  of  power,  but  a  pilot  tone  and  information 
tone  are  not  necessarily  allocated  the  same  power.  It  is  shown  in  [9,  II]  that  the  amount 
of  power  allocated  is  dependent  on  L  and  N.  Their  result  is  found  by  maximizing  capacity 
which  considers  a  balance  between  channel  estimation  accuracy  and  symbol  estimation. 
Under  the  condition  where  L  =  N  equal  power  is  allocated  to  both  the  pilots  and 
information  tones.  However,  when  L  <  N  more  power  is  allocated  to  the  information 
tones. 

So  far  the  sub-carriers  at  which  the  pilots  are  located  have  been  discussed.  The  amount 
of  power  allocated  to  the  pilots  over  information  sub-carriers  has  also  been  optimized  and 
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the  pilot  tones  are  equally  powered.  The  values  used  for  the  pilots  have  not  been  discussed, 
which  is  the  topic  of  [12].  For  a  single  transmitter,  the  L  pilots  are  equal  powered  and 
equal  spaced.  For  multiple  transmitters  with  a  flat  fading,  L  =  1,  channel,  the  pilots  are 
also  orthogonal,  but  for  frequency  selective  channels  (L  >  2)  the  pilots  should  be  phase- 
shift-orthogonal.  The  level  correlation  of  the  pilot  tones  across  the  transmitters  is  related 
to  how  well  the  pilots  are  suited  to  estimate  the  channel  [13].  If  the  pilots  are  phase  shift 
orthogonal,  the  correlation  is  zero,  but  with  Radio  Frequency  (RF)  front-ends,  correlation 
may  be  introduced. 

Described  above  are  methods  used  for  allocating  pilot  tones  to  be  transmitted  to  the 
receiver  for  channel  estimation.  Unfortunately,  the  tones  have  to  be  known  at  the  receiver 
in  which  case  information  can  not  be  sent  during  this  time.  As  an  alternative,  channel 
tracking  is  also  a  valid  method  of  channel  equalization.  Where  pilots  and  preambles  are 
used  in  the  beginning  of  transmissions  to  estimate  the  channel,  then  the  channel  fluctuations 
are  tracked  via  a  reduced  amount  of  pilots  or  done  blindly;  this  scheme  is  presented  in  [14]. 
Analysis  under  Rayleigh  [15]  or  Ricean  [16]  channel  conditions  is  also  explored  using  the 
Extended  Kalman  Filter. 

2.2.1.2  Cyclic  Prefix  Length. 

The  CP  in  OFDM  waveforms  offer  performance  boosting  capabilities.  The  length  of 
the  CP  must  be  chosen  carefully,  the  CP  length  should  be  chosen  to  be  just  long  enough  to 
capture  the  entire  impulse  response  of  the  channel.  Too  long,  and  time  is  used  to  transmit 
redundant  information  and  negatively  impacts  capacity.  Reducing  the  CP  length  increases 
capacity  until  the  CP  is  shorter  than  the  impulse  response,  in  this  case  interference  is 
induced  which  negatively  impacts  the  fidelity  of  the  communication  link. 

OFDM  waveforms  experience  no  IBI,  ISI,  and  ICI  when  the  CP  is  longer  than  the 
channel’s  impulse  response  [17-21].  In  practice  however,  the  impulse  response  can  be 
longer  than  the  cyclic  prefix.  The  energy  outside  the  CP  degrades  the  system  performance. 
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The  amount  of  interference  induced  on  the  N  sub-carriers  is  then  characterized  to  obtain 


an  estimate  of  the  SINK  of  the  system.  If  IBI,  ISI,  and  ICI  occur,  choosing  the  Time  Of 
Reference  (TOR)  intelligently  mitigates  the  power  of  the  interference. 

The  analysis  of  IBI,  ISI,  and  ICI  is  found  in  [17,  18,  21].  Ref.  [17]  analyzes  how 
the  OFDM  symbol  length,  CP  length,  and  timing  mismatch  with  a  certain  variance  affect 
the  SINR  and  capacity.  This  chapter  also  considers  the  trade-off  between  CP  length  and 
capacity.  If  the  CP  is  assigned  to  be  longer,  the  capacity  degrades,  but  if  the  interference 
induced  by  keeping  the  CP  the  same  length  does  not  reduce  the  performance  as  much  as 
increasing  the  CP  length,  the  interference  in  this  case  is  the  welcomed  trade-off. 

The  interference  induced  by  a  particular  channel  tap  increases  linearly  as  the  channel 
tap  index  increases  outside  the  CP.  Shown  in  [18],  the  energy  at  channel  tap  Ncp  +  l  does  not 
contribute  to  the  power  of  the  interference  as  much  as  the  energy  at  channel  tap  Ncp  +  10; 
because  of  this,  [18]  chooses  the  optimal  TOR  to  reduce  IBI  effects.  For  example,  the 
energy  at  channel  tap  Ncp  -t  1  is  weighted  by  a  factor  of  1  and  the  energy  at  Ncp  -l-  10  is 
weighted  by  10.  So  the  energy  at  Ncp  +  10  is  probably  smaller  but  because  of  the  weighting 
factor,  may  contribute  more  to  the  power  of  the  interference.  This  calculation  of  the  IBI  is 
shown  explicitly  here  [18]: 

PiBi  =  2cr|  ^  (n  -t  1)  [hl(n)  +  hl(n)j ,  (2.1) 

rt>0 

where  hain)  and  ht,in)  represent  the  portions  of  the  impulse  response  outside  the  CP,  hain) 
denotes  the  portion  before  the  CP  starts  and  hb(n)  represents  the  portion  after  the  CP  ends. 

Another  technique  to  minimize  the  power  of  the  interference,  or  equivalently 
maximize  the  SINR,  pilot  tones  are  used  in  the  time  domain  to  synchronize  the  Discrete 
Fourier  Transform  (DFT)  window  [19].  Interference  is  also  a  problem  in  situations  where 
the  transmitters  are  not  synchronized  at  the  receiver.  Delay  between  received  signals  can 
be  represented  as  a  longer  impulse  response.  This  is  the  case  in  cellular  networks  where 
geometry  introduces  delay  into  the  system.  In  this  situation,  the  channel  length  may  be 
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less  than  the  CP  length,  but  the  delay  between  reeeived  signals  makes  the  resulting  impulse 
response  violate  this  eonstraint  [20] . 

Charaeterization  of  the  SINK  in  SISO  eommunieations  starts  with  the  definition  of  a 
reeeived  OFDM  signal,  d{n),  that  models  an  ideal  eireular  eonvolution.  The  transmitted 
signal,  Xnt{n),  for  this  idealistie  signal  does  not  eonsist  of  a  CP  instead  the  N  point  OFDM 
bloek  is  repeated. 

The  traditional  OFDM  signal  that  eonsists  of  a  CP  is  then  denoted  by  Xcp,nt{n).  IBI  is 
then  induced  by  the  channel  if  the  channel  is  longer  than  the  CP  in  the  transmitted  signal. 
At  the  receiver  y{n)  is  then  the  received  signal  with  effects  of  IBI  included. 

The  amount  of  IBI  at  each  time  domain  sample  is  then  the  subtraction  of  d{n)  and  y(n), 
q{n)  =  d{n)  -  y(n)  and  put  into  vector  form  q  =  [^(0)  ^(1) . . .  q(N  -  1)]^.  The  difference 
between  the  true  data  symbols  and  the  estimated  symbols  is  then  The  Power 

Spectral  Density  (PSD)  of  this  error  is  then  [18] 


Se(k) 


^  {r^(m)| 

\h(k)f 


(2.2) 


The  ^  {•}  is  the  Fourier  transform  operator  and  r^im)  is  the  autocorrelation  function  of  q. 
This  IBI  analysis  is  extended  to  MIMO  communications  in  Section  3.1.2. 


2.2.2  OFDMA. 


OFDMA  is  a  waveform  used  in  the  Long-Term  Evolution  (LTE)  standard  for  allowing 
multiple  user  access.  This  waveform  is  only  used  in  the  communication  from  the  cell  phone 
tower  to  the  mobile  user.  Each  user  is  assigned  a  time-frequency  slot  or  resource  block 
where  the  user  can  transmit  its  data  on  the  sub-carriers  in  the  resource  block.  A  resource 
block  consists  of  12  sub-carriers  and  data  symbols  are  allocated  to  the  12  sub-carriers  in 
the  same  fashion  as  an  OEDM  system  [22]. 

A  drawback  of  OEDMA  and  OEDM  is  that  the  Peak-to-Average  Power  Ratio  (PAPR) 
is  high  [22].  A  higher  PAPR  results  in  a  more  expensive  power  amplifier  with  a  larger 
linear  range.  Eor  this  reason,  OEDMA  is  only  used  in  the  communication  from  the  cell 


12 


phone  tower  to  the  mobile  user.  For  the  communication  from  the  mobile  user  to  the  cell 
phone  tower,  Single  Carrier  -  Frequency  Division  Multiple  Access  (SC-FDMA)  is  used 
because  it  has  a  reduced  PAPR  compared  to  OFDMA  and  OFDM. 

2.2.3  SC-FDMA. 

The  difference  between  OFDMA  and  SC-FDMA  is  an  added  Fast  Fourier  Transform 
(FFT)  operation  for  SC-FDMA  is  needed  before  the  symbols  are  defined  to  the  subcarriers, 
consequently  an  added  FFT  operation  is  needed  at  the  receiver  as  well.  In  the  SC-FDMA 
waveform,  the  Quadrature-Amplitude  Modulation  (QAM)  symbols  are  defined  in  the  time 
domain,  an  FFT  operation  converts  the  time  domain  data  to  the  frequency  domain.  The 
frequency  domain  coefficients  are  then  assigned  to  the  12  sub-carriers  allotted  to  the 
user.  The  Inverse  Fast  Fourier  Transform  (IFFT)  is  used  to  convert  the  frequency  domain 
information  to  the  time  domain  to  determine  the  transmit  waveform.  With  this  waveform 
the  feature  of  providing  access  to  multiple  users  is  preserved,  while  reducing  PAPR  [22]. 

2.3  System  Model 

In  this  section,  the  topology  for  the  MIMO  communication  system  is  described.  The 
locations  for  the  Nt  transmitters  and  the  collocated  Nr  receive  antennas  play  a  vital  role 
in  SINR  characterization.  The  relation  the  Ex’s  position  has  to  the  transmitters  plays  a 
role  in  the  expected  performance  that  is  obtained  by  the  Ex.  Eollowing  that,  the  model 
for  traditional  MIMO  communications  is  outlined  along  with  Channel  Estimate  (CE) 
algorithms  for  the  simulations. 

2.3.1  Proposed  Topology. 

Eigure  2.1  shows  the  topology  of  the  transmitters  and  receivers.  The  Region  of 
Activity  (ROA)  is  pictured,  which  represents  the  area  in  which  transmitters  are  located. 
The  transmitters  are  communicating  with  Rx  in  the  direction  of  0  =  0,  where  6  is  measured 
with  respect  to  the  line  from  the  ROA  center  to  Rx.  AtOi^O,  an  Ex  is  potentially  present. 
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Figure  2.1:  Region  of  Activity  (ROA)  of  transmitters  communicating  with  Rx.  Also 
depicted  are  other  eavesdropping  receivers  close  to  the  ROA. 


In  the  ROA  A,  transmitters  are  considered  to  be  equipped  with  one  antenna  each.  The  Rx 
and  Ex  are  equipped  with  >  Nt  antennas  each. 

In  the  ROA,  the  transmitter  positions  are  randomly  distributed.  First,  the  uniform 
distribution  is  considered  for  transmitter  locations.  The  radius  of  the  ROA  then  corresponds 
to  the  outer  limit  a  transmitter  can  be  positioned.  Later,  a  Gaussian  distribution  is  used  to 
model  transmitter  positions  where  the  radius  of  the  ROA  is  related  to  the  variance  of  the 
Gaussian  distribution. 
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The  transmitters  are  synehronized  in  the  direetion  of  Rx.  This  reduees  ehannel  effeets 
sueh  as  IBI  in  the  eooperative  system.  However,  the  Ex  at  some  6  does  not  have  this  luxury 
of  synehronizing  with  the  transmitters.  If  the  performanee  of  the  Ex  is  redueed  while 
maintaining  the  fidelity  for  the  Rx,  then  leveraging  IBI  is  a  valid  physieal  layer  seeurity 
teehnique. 


2.3.2  Traditional  MIMO-OFDM  Communications. 

In  traditional  MIMO  Communieations  a  ehannel  matrix,  denoted  H  6 
represents  the  frequeney  flat  eomplex  ehannel  between  the  nt  and  nr  transmitter  and 
reeeiver  pair.  Eor  a  frequeney  seleetive  ehannel  a  reeeived  time-domain  sample  is  modeled 
as: 


Ytir 


(  N, 

^  ^nr,  nt 
\nt=\ 


\ 

^nt 

) 


+  n 


(2.3) 


where  is  a  series  of  samples  indexed  by  n,  y^r  =  [y«r(0)  JnrCl)  •  • +  ^  -  2)]^, 
where  N  is  the  number  of  subearriers,  x„t  is  the  transmitted  signal  ineluding  the  Ncp  CP 
samples,  N'  =  N  +  Ncp,  =  [x„,(0)  •  •  •  XnXN'  -  1)]^,  n  is  Additive  White  Gaussian 

Noise  (AWGN)  with  zero  mean  and  power  of  cr^,  n  ~  CN(0,  crp).  The  Rayleigh  fading 
frequeney  seleetive  ehannel  between  the  nt  transmitter  and  nr  reeeiver  is  h„r,  nt,  whieh  is  of 
length  L  with  delay  profile  as  in  [23]. 

Sinee  OEDM  defines  the  data  symbols  in  the  frequeney  domain  Equation  (2.3)  is 
analogous  to 

(  Na  \ 


y«r 


+  n. 


(2.4) 


\nt=[  / 

where  all  veetors  in  Equation  (2.4)  have  a  length  of  N  and  0  denotes  Hadamard  (element 


wise)  multiplieation.  Eaeh  of  the  elements  in  represents  the  frequeney  response  at  a 
partieular  subearrier,  k.  If  the  Nr  symbols  at  a  partieular  subcarrier  are  considered  in  vector 


fk  and  the  channel  mixing,  per  subcarrier,  is  represented  by  the  matrix  H;;.,  is  given  by 


fk  =  +  fl/t, 


(2.5) 
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Figure  2.2:  In  the  frequency  domain  pilot  tones  are  allocated.  Blocks  of  Nt  x  Nt  subcarriers 
with  pilots  along  the  diagonal  are  interspersed  with  blocks  of  Nt  x  N^t  used  for  data 
transmission. 


where  fk  =  {yk,i  ■  ■  ■  yk,NrY ^  6  represents  the  mixing  matrix  for  the  MIMO  channel 

on  the  subcarrier,  x^.  =  [%i . . .  and  n<.  6  is  noise. 

To  obtain  estimates  of  the  transmitted  data  symbols,  Frequency  Domain 

Equalization  (FDE)  is  used  in  OFDM  systems  where  simple  division  equalizes  the  SISO 
channel.  For  MIMO-OFDM  systems  matrix  inversion  is  used.  This  requires  the  estimation 
of  the  N  channel  matrices,  H^;,  in  Equation  (2.5). 

2.3.3  Channel  Estimation  Algorithms. 

Two  methods  of  CE  in  OEDM  communications  are  pilot  based  (frequency  domain) 
and  preamble  based  (time  domain)  methods.  This  section  discusses  one  type  of  pilot 
allocation  then  moves  to  preamble  based  CE. 

2. 3. 3.1  Frequency  Domain  CE. 

In  Erequency  Domain  Channel  Estimation  (EDCE)  pilot-tones  are  used  at  the  receiver 
to  estimate  the  Channel  State  Information  (CSI).  The  number  of  pilot-tones  used  has  been 
shown  to  be  the  number  of  degrees  of  freedom  for  the  frequency  response.  Eor  now,  L  is 
assumed  to  be  known.  Each  pilot  symbol  is  denoted  by  Pnt,e- 
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Consider  the  first  transmitter.  The  first  piiot  is  piaced  in  the  first  subcarrier,  Pi  j.  The 
first  piiot  for  the  second  transmitter  is  piaced  on  the  second  subcarrier,  ^2,1,  and  the  first 
subcarrier  is  zeroed  for  the  second  transmitter.  This  ensures  that  at  the  first  subcarrier  there 
is  no  interference  between  the  two  transmitters.  Likewise,  the  first  transmitter  has  its  second 
subcarrier  zeroed  for  the  same  reason.  This  scheme  is  extended  for  aii  transmitters  and  for 
ali  L  piiots.  Once  the  piiots  and  zeroed  subcarriers  are  assigned,  the  remaining  1  biocks 
of  subcarriers  can  be  assigned  with  the  payioad  data.  Each  biock  is  Ndb  =  subcarriers 
wide  for  each  transmitter. 

At  the  nE*  receiver,  the  frequency  response  is  estimated  for  each  of  the  piiots  aiiocated 
at  the  transmitter.  Since  the  piiots  are  separated  in  frequency,  frequency  response  estimates 
at  the  piiot  subcarriers  are  caicuiated  by  division 


ynr(knf(0) 


nt,i 


(2.6) 


where  p„r,nf(0  are  the  L  frequency  response  estimates  at  the  piiot-tone  subcarriers  as  a 
function  of  transmitter  denoted  k„f(^),  such  that  x„f(k„i(^))  =  Pn^. 

This  method  for  estimating  the  frequency  response  coefficients  is  the  same  as  in 
a  SISO  communication  system,  since  frequency  separation  is  being  rehed  on  to  reduce 
interference  for  piiot- tones.  However,  contains  L  estimates,  however  the  fuU  N 

frequency  response  coefficients  are  needed  to  spatiaiiy  separate  the  payioad  data  symbois. 

Interpoiation  is  used  to  determine  estimates  for  the  rest  of  the  N  subcarriers  in  the 
frequency  response.  For  this,  a  sub-matrix  of  the  A-point  DFT  matrix,  Tn,  is  used.  'Wl^m 
is  defined  as 


(2.7) 


where  "W^nt  consists  of  the  first  L  rows  of  Tn  and  the  coiumns  that  correspond  to  the  piiot 
tone  subcarrier  indices  as  a  function  of  the  transmitter  number.  The  L  frequency  domain 
sampies,  p„r,nt,  are  used  to  estimate  the  impuise  response. 


=  TV 


L,ntPnr,nt^ 


(2.8) 
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which  is  used  to  calculate  the  frequency  response  by  zero-padding  the  L  samples  with 
N  -L  zeros  and  performing  an  FFT  operation  to  achieve  the  N  sample  frequency  response. 
Equation  (2.8)  is  used  for  each  transmitter  and  receiver  pair  to  obtain  the  full  NyNt 
frequency  responses.  These  estimates  are  then  used  in  the  Least  Squares  (LS)  solution 
to  channel  equalization, 

^  =  (HfH,)“'Hfy,.  (2.9) 

23.3.2  Time  Domain  CE. 

Pilot-tones  are  not  used  in  Time  Domain  Channel  Estimation  (TDCE),  instead  the 
entire  transmitted  signal  is  known.  In  this  case  a  preamble  is  known  at  the  receiver  for  this 
type  of  CE.  The  benefits  to  this  method  is  that  there  are  many  more  known  values  with 
which  to  estimate  the  CSI,  however,  no  data  is  transmitted  during  the  preamble’s  duration. 
Eor  this  to  be  a  valid  form  of  CE,  the  channel  coherence  time  must  be  longer  than  the 
preamble  [24].  The  longer  the  channel  is  coherent,  the  longer  the  current  CSI  estimate  can 
be  used  to  equalize  received  data. 

This  section  outlines  the  matrix  structure  used  at  the  receiver  to  estimate  the  NyNt 
impulse  responses.  A  matrix,  X,  denotes  this  matrix  that  considers  the  Nt  transmit  signals, 
the  delay  experienced  for  each  transmitter,  d,  and  the  number  of  taps  in  the  impulse 
response  to  be  estimated,  L.  Eirst,  Xnt,cp  =  [.^nf(A^  -  Ncp)  •  •  •  Xnt(N)  x„,(0) . . .  Xnt(N  -  l)j  with 
length  N'  =  N  +  Ncp  is  considered  along  with  the  delay  experienced  for  each  transmitter 
which  is  used  to  prepend  zeros  to  Xntpp'- 

^d(nt) 

'^nt.delay  =  '^cp,nt  .  (2.10) 

^max(d)-d(nt) 

Xcp  consists  of  Nt  columns  and  N'  -l-  max(d)  rows; 

Xcp  =  •  (2.11) 
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Next,  zeros  are  prepended  to  for  each  value  of  ^  =  0, 1, . . . ,  (L  -  1), 


X,  = 


^t,N, 

^cp 


(2.12) 


The  L  matrices  that  Equation  (2.12)  defines  are  then  used  to  make  X; 


X  =  [Xo...X^_i]. 


(2.13) 


The  structure  of  the  X  matrix  then  determines  the  structure  of  where, 


h/tr  ~ 

h/tr/ 

where  = 

hnr,Ni,t 

The  received  signal,  y„r,  is  then  modeled  as 


y^r  —  Xh,j,-  +  n 


(2.14) 


(2.15) 


with  the  dimensions  of  X  being  {N'  +  {L-  1)  +  max(d))  x  (NtL),  is  (NtL  x  1),  and  both 
ynr  and  n  are  (N'  +  {L-  1))  x  1.  To  estimate  h„r  a  LS  solution  is  used  [25]  where 

h„,  =  (X^X)“'x"y„,.  (2.16) 

Equation  (2.16)  is  used  at  each  receiver,  to  obtain  an  impulse  response  estimate  for  each 
transmitter  and  receiver  pair.  The  equalization  process  uses  this  channel  estimate  in 
Equation  (2.9). 
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2.4  Multi-User  TOA  Estimation  Algorithms 

This  system  described  in  this  section  consists  of  Nt  users  transmitting  to  one  receiver. 
The  users  are  allocated  a  subset  of  available  subcarriers,  N.  If  each  user  is  allocated  ^ 
subcarriers  they  may  not  be  contiguous.  This  happens  when  the  users  are  dynamically 
allocated  subcarriers  [26] . 

Considering  a  single  user  where  bits  are  mapped  to  a  constellation  and  are  used 
to  assign  the  subcarriers  in  which  they  were  alloted.  Snt  denotes  the  set  of  subcarriers 
allocated  to  user  nt.  The  frequency  domain  signal  for  user  nt  is  then 

X  =  [xi  X2  ■ . .  XnV  (2.17) 


where  k  indexes  the  subcarriers.  A  puncture  matrix  is  defined  where  the  diagonal  elements 
^k,k  =  0  if  k  i  Snf  The  N  point  DFT  matrix  is  denoted  T'n  and  the  CP  is  added  with  the 
operator,  Q 


Q 


I 


-N 


(2.18) 


The  baseband  signal  to  be  transmitted  for  each  user  is  then  given  by 


Xcp,nt  =  QTN^Xnt.  (2.19) 

To  simplify  the  analysis  a  frequency  flat  fading  is  assumed,  however  in  the 
implementation  in  Section  4.3  this  assumption  is  lifted.  The  Time  of  Arrival  (TOA)  for 
each  user  at  the  receiver  is  a  function  of  two  parameters.  The  first  is  local  clock  differences 
between  the  transmitter  and  receiver.  =  trx  -  Tnt  where  trx  denotes  the  local  clock 
at  the  receiver  and  is  the  local  clock  at  the  transmitter.  Ont  is  then  the  propagation 
delay  experienced  for  each  user  based  on  the  position  of  the  user.  The  impulse  response 
experienced  by  each  user  is 

l>"<  =  K  <2-20) 
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Figure  2.3:  Illustration  of  the  van  de  Beek  method  for  determining  the  TOA  for  OFDM 
symbols  where  the  CP  is  leveraged. 


where  hnt  is  the  complex  scalar  denoting  the  frequency  flat  fading  channel  between  the  nt 
user  and  the  receiver.  The  total  delay  experienced  at  the  receiver  is  6„t  =  (pnt  +  Om-  The 
received  signal  y  is  then 

(  N,  \ 


y  = 


z 


^cp,nt 


+  n 


(2.21) 


V;7f=l  / 

where  n  is  complex  AWGN  and  ★  denotes  convolution. 

2.4.1  van  de  Beek  Method. 

The  blind  estimation  of  the  TOA  for  OFDM  leverages  the  structure  of  the  CP  in  the 
received  waveform.  The  correlation  between  blocks  that  are  Ncp  in  length  and  are  N  -  Ncp 
samples  apart  peak  when  the  symbol  is  aligned  with  the  sliding  window  indexed  by  s.  An 
illustration  of  this  method  is  shown  in  Figure  2.3. 


The  maximum  likelihood  estimator  for  Q  is  given  by  [27] 

6  =  argmax  ||y(6)|  -  pO(0)|  (2.22) 

e 

where 

S+Vcp-l 

yis)  =  ^  y(m)y*(m  +  N),  (2.23) 

m=s 
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p  is  the  magnitude  of  the  eorrelation  eoefheient  between  y{m)  and  y{m  +  N)  and 

^  S+Ncp"^ 
m=s 

Equation  (2.22)  is  derived  in  [27]  with  eonsideration  given  to  Carrier  Frequeney  Offset 
(CFO),  whieh  is  assumed  to  be  zero  here  and  the  seeond  term,  p<l>(0),  is  assumed  to  be 
negligible  in  a  system  where  the  SNR  is  eonstant. 

When  determining  the  TOA  for  Nt  users  Equation  (2.22)  is  used  for  eaeh  of  the  users. 
Onee  a  maximum  is  found  at  an  5,  y  (5  +  Ncp^  is  set  to  zero  and  the  next  maximum  is  found. 
In  this  seheme,  an  estimate  of  delay  ean  not  be  matehed  to  a  partieular  user  definitively. 
This  would  not  be  a  valid  method  for  estimating  the  TOA  for  the  A,  users  for  Ex  sinee 
the  delay  for  eaeh  user  is  needed.  The  next  method  deseribed,  the  Acharya  method,  does 
provide  speeifie  user  delay. 

2.4.2  Acharya  Method. 

The  log-likelihood  funetion  to  be  maximized  is  [26] 

X  =  log  p{y\en,).  (2.25) 

The  Probability  Density  Funetion  (PDF),  given  the  delay  6nt,  is 

p(y\dnt)  =  exp  |-^y^Cg  (2.26) 

where  CX  is  the  eovarianee  matrix  as  a  funetion  of  the  delay  to  be  estimated.  CX  is 

Cg  =  diag  1^1(9,  C,  I^-ej  (2.27) 

where  C  =  E  jyy^j  is  the  reeeived  eovarianee  matrix.  In  Ref.  [26]  the  eovarianee  matrix 
is  broken  into  subbloeks  for  the  samples  eorresponding  to  the  CP,  denoted  by  u  and  the 
N  -  Ncp  remained  samples  denoted  by  v.  They  then  define  X,  Y  and  Z  as: 

X  =  E  juu") ,  Y  =  E  juv") ,  Z  =  E  jvv") .  (2.28) 
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C  is  then  provided  in  [26]  as: 


(Tpi  +  X 

Y 

X 

Y" 

f^Iv-L  +  Z 

YH 

X 

Y 

The  estimate  for  6  is  found  by  maximizing  Equation  (2.25): 

e  =  argmin  log  (|Cg|)  +  V-  (2.30) 

e  z 

2.5  FPGA  Overview 

An  FPGA,  as  the  name  implies,  is  a  programmable  gate  array.  A  gate  array  is  a  series 
of  logic  blocks  that  are  configured  via  software.  On  the  FPGA  itself  is  a  blank  slate  of  logic 
blocks  and  VHDF  or  Verilog  is  used  to  program  these  logic  blocks  to  do  a  specific  task. 
This  is  opposed  to  a  Central  Processing  Unit  (CPU)  or  Graphical  Processing  Unit  (GPU) 
in  the  sense  that  instructions  are  provided  to  the  processing  unit  and  the  instructions  are 
executed  in  order,  ignoring  parallel  processing  for  now,  the  instructions  are  given,  executed, 
and  the  result  is  provided.  In  an  FPGA  the  hardware  itself  is  programed.  If  a  multiplier  is 
needed,  a  multiplier  is  instantiated  in  the  logic  blocks.  The  power  of  the  FPGA  comes  from 
the  ability  to  perform  calculations  in  parallel,  for  example  Multiply  Accumulate  (MAC) 
operations,  if  256  MAGs  are  needed  in  a  FFT  operation,  256  MAGs  are  instantiated  and 
can  be  completed  concurrently. 

Parallel  processing  occurs  in  a  CPU  or  a  GPU  as  well.  In  these  processors,  there  are 
multiple  units  of  a  particular  function  such  as  the  MAGs  in  the  above  example,  but  the 
number  of  the  MAC  units  are  not  variable  as  in  an  FPGA.  Making  use  of  an  FPGA  to 
develop  hardware  specific  to  the  application  provides  faster  computation  times,  making 
FPGA  development  a  very  powerful  tool. 

2.5.1  MIMO  Receiver  Latency  and  Throughput. 

A  receiver  is  real-time  compliant  if  each  calculation  block  in  the  MIMO  receiver  can 
maintain  a  constant  data  rate.  If  the  data  rate  going  into  the  block  is  higher  than  the  data  rate 
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going  out  of  the  block,  the  block  is  then  not  real-time  compliant.  A  possible  architecture 
change  or  simply  reducing  the  data  rate  are  possible  solutions.  The  latter  is  not  ideal  since 
the  reason  for  investigating  MIMO  communications  is  to  maximize  the  data  rate. 

A  receiver  may  also  be  burst  mode  capable  where  cycles  of  communication  and 
waiting  are  provided  to  the  receiver.  During  the  communication  section,  data  is  transmitted 
and  the  receiver  synchronizes  to  the  data  and  begins  the  demodulation  process.  The 
demodulation  process  continues  into  the  waiting  section  to  allow  for  further  computations. 
The  duty  cycle  ratio  of  communication  and  waiting  times  is  known  at  the  transmitter  and 
does  not  overwhelm  the  receiver.  A  real-time  compliant  receiver  may  also  operate  in  burst 
mode  but  the  transmitter  may  send  data  as  often  as  needed. 

If  the  computational  complexity  of  the  MIMO  receiver  is  too  high  for  a  specific  field 
grade  platform,  logging  data  and  using  a  super  computer  with  more  resources  is  possible 
[28].  This  situation  is  not  investigated  in  this  paper,  but  is  included  for  completeness.  This 
paper  first  focuses  on  designing  a  MIMO  receiver  that  is  real-time  compliant.  Then  this 
design  criteria  is  lifted  to  provide  a  reduced  data  rate  Burst  Mode  communication  system  in 
which  the  amount  of  resources  used  is  reduced  allowing  for  a  more  flexible  implementation. 

2.5.2  FPGA  Resources. 

Figure  2.4  shows  a  diagram  of  how  the  FPGA  fabric  is  laid  out.  Logic  blocks  are 
shown  in  a  grid  pattern  with  dedicated  multipliers  and  Random  Access  Memory  (RAM) 
blocks.  The  dedicated  multipliers  save  logic  blocks  since  the  multiplication  is  an  expensive 
operation  and  used  frequently.  The  operands  and  results  are  saved  in  RAM  Blocks  on  the 
FPGA  next  to  the  logic  blocks. 

In  complex  designs,  careful  consideration  is  taken  into  resource  utilization.  For 
example,  the  logic  blocks  that  make  up  the  FFT  operation  are  instantiated  on  the  FPGA 
close  to  the  RAM  that  stores  the  time  domain  data.  This  avoids  the  issue  of  logic 
switches  being  used  for  transporting  time  domain  values  to  the  FFT  block.  However,  in 
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Figure  2.4:  FPGA  fabric  with  resources  highlighted  [29]. 


practice  transceiver  designs  are  complex  and  it  is  unavoidable  to  use  some  logic  switches 
to  transport  results  from  a  process  to  another  process. 

2.5.3  WARP  Board. 

The  WARP  Board  is  a  Software  Defined  Radio  (SDR)  developed  by  Rice  University 
for  physical  layer  development.  The  boards  have  been  used  in  some  network  protocol 
research  [30],  but  most  experiments  using  the  WARP  Boards  are  in  the  area  of  wireless 
communications,  for  example,  WARPLab  [30].  A  major  benefit  to  the  boards  for  MIMO 
communications  is  the  higher  end  clock  support,  which  is  then  shared  to  the  four  radio 
cards. 

The  WARP  Boards  used  in  this  project  are  the  second  version  Rice  has  developed 
which  has  an  FPGA  and  an  embedded  processor  on  the  board.  The  FPGA  is  the  Xilinx 
Virtex-4  and  has  IBM’s  Performance  Optimization  With  Enhanced  RISC  Performance 
Computing  (PPC)-405  processor  as  well.  The  FPGA  and  PPG  work  in  tandem  to  control  a 
multitude  of  peripheral  devices.  These  devices  include  [30] 

1.  DDR2  SO-DIMM  Slot  2GB  SO-DIMM  Installed 
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2.  Daughter-card  Slots 


3.  Push  Button,  LEDs,  DIP  Switches,  and  Hex  Displays 

4.  Ethernet  Port 

5.  USB  and  Serial  UART 

6.  16-Bit  Digital  I/O  Header  Pins 

7.  Multi- Gigabit  Transceivers,  HSSDC2,  SATA,  and  SEP 

Item  2  provides  an  interface  to  a  radio  card,  where  four  radio  cards  can  be  installed  on  the 
main  board.  The  radio  cards  are  the  RE  front-ends  that  allow  the  EPGA  to  transmit  and 
receive  signals  in  the  ISM  bands,  2.4  and  5  GHz  ranges.  The  radio  card  specifications  are 
described  in  Table  2.2. 


Table  2.2:  Radio  card  operating  parameters 


Parameter  Name 

Parameter  Value 

Digital  to  Analog  Converters 

160  MS/s  16-bit 

Analog  to  Digital  Converters 

65  MS/s  14-bit 

Dual  Band  Operation 

2.4  -  2.48  GHz  and  4.9  -  5.875  GHz 

Bandwidth 

40  MHz 

RSSI  Range 

60  dB 

Tx  Power  Control  Range 

30  dB 

Rx  Gain  Control  Range 

93  dB 

Output  Power  at  Pull  Gain 

18  dBm 
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2.5.4  WARPLab. 


WARPLab  is  a  project  developed  by  Rice  University  and  maintained  in  an  open-source 
environment  [30].  This  allows  MATLAB®  to  control  the  WARP  Board  via  an  Ethernet 
connection.  More  than  one  board  can  be  controlled  by  a  single  computer  by  the  use  of  an 
Ethernet  switch  with  a  simple  local  network. 

The  WARPEab  reference  design  provides  a  vehicle  for  baseband  signals  designed 
in  MATEAB  to  be  loaded  onto  the  board.  Also  the  reference  design  allows  transmit 
and  receiver  parameters  like  center  frequency,  gain  etc  to  be  set,  and  then  allows  for 
transmit  and  receiver  operation  to  commence.  With  this  system  the  transmitter  and  receiver 
synchronization  is  rough  due  to  Ethernet  jitter,  but  it  provides  a  mechanism  for  real  world 
transmit  and  received  signals  to  be  analyzed  while  remaining  in  a  MATEAB  environment. 
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III.  Covert  MIMO  Communication 


In  Chapter  I  a  scenario  is  described  where  information  is  shared  between  many 
entities.  The  scenario  is  further  described  as  a  hostile  environment.  The  topology  for  this 
hostile  environment  is  outlined  in  Section  2.3.1.  For  this  system  a  mechanism  is  needed  to 
provide  level  of  security. 

In  this  chapter,  IBI  is  used  to  degrade  the  eavesdropping  receiver’s  performance  by 
designing  the  system  with  a  CP  that  is  just  long  enough  for  synchronized  communication 
among  many  uplink  users.  Synchronized  communication  is  done  in  cell  phone  networks 
to  provide  resource  blocks  to  users  with  division  in  time  and  frequency  [22],  and  a  similar 
process  is  used  to  ensure  zero  delay  between  user  waveforms. 

Characterization  of  the  effect  of  IBI  is  developed  as  a  function  of  delay  based  on  the 
distribution  of  the  users’  locations.  In  the  system  analyzed,  the  delay  between  transmit 
waveforms  is  the  only  source  of  IBI.  The  distribution  of  user  locations  is  calculated  for 
normal  and  uniformly  distributed  transmitter  locations. 

Section  3. 1  analyzes  the  distributions  of  delays  in  the  topology  defined  in  Section  2.3. 1 
then  the  SINK  is  derived  for  the  MIMO  communication  system.  The  SINK  and  BER 
performance  as  a  function  of  angle  is  simulated  and  compared  to  the  analytical  SINK  and 
BER  in  Section  3.2. 

3.1  SINR  and  Bit  Error  Rate  Derivation 

In  this  section,  the  distribution  of  delays  is  derived  as  a  function  of  transmitter  location. 
Since  delay  induces  IBI  on  the  system,  a  derivation  of  SINR  for  the  MIMO  system  follows 
the  derivation  of  delay  distribution.  The  expected  SINR  for  given  channel  is  calculated. 
Using  the  SINR  the  theoretical  BER  is  calculated  with  the  assumption  that  the  noise 
including  IBI  follows  a  Gaussian  distribution. 
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3.1.1  Distributions  of  Delays. 

Asynchronous  delay  between  transmissions  effectively  makes  the  impulse  response 
longer.  A  longer  impulse  response  may  potentially  violate  Ncp  >  L  -  I  which  would  have 
a  degrading  effect  on  the  fidelity  on  the  communication  system.  The  aim  of  this  section  is 
to  characterize  the  delay  as  a  function  of  6,  once  this  is  accomplished  the  delay  is  used  in 
calculating  the  interference  induced  by  the  delay. 

In  Figure  2. 1  the  ROA  is  defined  as  the  area  in  which  a  transmitter  can  be  located. 
Nt  transmitters  are  considered  and  their  locations  in  the  ROA  are  randomly  distributed.  In 
the  following  sections  the  formulation  of  relative  distances  that  affect  delay  are  introduced. 
Based  on  these  distances  the  distributions  of  arrival  times,  with  respect  to  the  center  of 
the  ROA,  are  considered  with  the  transmitters  being  uniformly  distributed  and  normally 
distributed.  The  distribution  of  arrival  times  as  a  function  of  9  is  derived  and  compared  to 
simulations. 

3.1.1. 1  General  Delay. 

The  absolute  delay  between  a  transmitter  to  the  receiver  is  not  of  concern  but  the 
relative  delay  between  two  transmissions  needs  to  be  characterized.  For  this,  the  difference 
in  path  length  between  two  transmitters  is  the  important  parameter.  To  simplify  analysis 
the  small  angle  approximation  is  made  to  simplify  the  following  derivation,  as  shown  in 
Figure  3.1.  The  impact  of  this  assumption  is  further  explored  in  Sections  3. 1.1. 4  and 
3. 2. 1.3.  This  assumption  simplifies  the  delay  dependence  to  only  the  x  dimension,  the 
y  dimension  is  assumed  to  be  negligible. 

Recall  the  transmitters  add  delays  so  that  the  received  signals  are  synchronized  in  the 
6  =  0  direction.  For  synchronization  purposes,  Di  denotes  the  delays  in  the  6  =  0  direction, 
which  simply  relates  the  jc  coordinate  to  propagation  time: 

Di  =  -,  (3.1) 

c 
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Figure  3.1:  Depiction  of  small  angle  assumption.  Di  =  ^  (dr  -  dr^)  »  y 


where  c  is  the  speed  of  light.  Since  y  is  considered  to  be  small  in  relation  to  the  distance  the 
signal  is  propagated  Equation  (3.1)  does  not  depend  on  y.  In  Equation  (3.1),  x  represents 
the  X  coordinate  of  a  randomly  located  transmitter.  Values  for  Di  are  negative  when  the 
transmitter  is  closer  to  the  receiver  and  are  positive  when  the  transmitter  is  further  away 
from  the  receiver  with  respect  to  the  origin  (ROA  center). 

Eor  6  0,  D2  denotes  the  delays  in  the  direction  of  6,  for  this,  a  coordinate  rotation  is 

needed  for  the  analysis  where: 


cos  6 

sin0 

X 

-  sin0 

cos  9 

y 

Eor  the  rotated  coordinates  and  the  D2  calculation,  D2  is  similar  to  in  the  rotated 
coordinates,  however  the  dependence  on  x  and  y  are  needed  explicitly: 


c 

=  -  (jc  cos  0  +  y  sin  6) . 
c 


(3.3) 

(3.4) 
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The  Equations  for  Di  and  D2  are  used  in  determining  the  delay  distribution  for  all 
6.  D2  represents  the  delays  experienced  as  a  function  of  6  without  synchronization.  By 
subtracting  the  delay  in  the  6  =  0  direction,  namely  Di,  the  system  is  then  synchronized  in 
the  9  =  0  direction.  This  difference  is  then  A  defined  as 


D2  -  D\ 
cos  9-1 


X  + 


sin0 


y- 


(3.5) 

(3.6) 


Here,  A  is  a  function  of  9  and  represents  the  arrival  time  in  seconds  in  relation  to  a 
hypothetical  transmitter  placed  at  the  origin.  For  one  realization  of  A  one  transmitter 
is  considered  with  coordinates  (Xtx,ytx),  and  consider  the  case  where  9  =  0,  this  yields 
A  =  f(.rrtCos(0)  +  y,;,sin(0))  -  ^  =  0  which  holds  '^Xtx,  ytx- 

For  9  0  and  with  random  transmitter  locations,  A  is  also  random.  To  determine  the 

distribution  of  A  the  distribution  of  x  is  needed.  The  PDF  of  A,  /(A)  is  related  to  f(x)  by 


[39] 

/(A)  =  A.  (3,7) 

To  determine  g(A),  Equation  (3.6)  is  considered,  where  Equation  (3.6)  defines  a  line  for  a 
given  A,  shown  in  Figure  3.2.  In  general,  the  distance  between  the  line  defined  by  A  and 
the  origin  is  [40] 


r 


where  a 


Ac 

->/(cos'0^nO^"+7sin"^ 

caA 

1 

V(cos'0^nO^”+7sin"^ 


(3.8) 

(3.9) 
(3.10) 


3. 1.1.2  Uniform  Distributed  Transmitters. 

The  general  analysis  above  does  not  consider  a  specific  distribution  for  x  and  y.  For  a 
ROA  of  radius  R,  the  PDF  of  the  x  and  y  coordinates  are 

f(x,y)  =  where  x^+y^  <  R^.  (3.11) 
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Distance  (m) 


Figure  3.2:  Depiction  of  the  transmitters  positions  with  delay  A{6) 


Since  A  is  a  function  of  x  only,  and  y  is  considered  negligible,  the  PDF  of  x  is 


fix) 


£ 

£ 


f(x,y)  dy 


nR^ 


dy 


nR^ 


Substituting  into  Equation  (3.7): 


(3.12) 

(3.13) 

(3.14) 


/(A) 


2cQr  £r^  -  (cq'A)2 
nR^ 


(3.15) 
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which  is  the  distribution  of  the  delays  at  Ex  w.r.t.  Rx  for  a  given  6.  Further  analysis  of  the 
delay  is  discussed  in  Section  3.2.1. 

3. 1.1. 3  Gaussian  Distributed  Transmitters. 

The  definition  of  the  ROA  is  slightly  different  for  the  Gaussian  distributed  transmitters, 
just  in  the  sense  that  the  Gaussian  distribution  has  infinite  support.  For  this,  cr^  is  introduced 
as  the  standard  deviation  used  for  the  Gaussian  PDF.  The  distribution  of  the  coordinates  of 
the  transmitters  is: 


f{x,y)  = 


1 


2ncr\ 


exp 


1 


2crl 


ix^+y^) 


(3.16) 


Since  the  coordinates  are  independent,  and  the  x  coordinate  is  the  only  coordinate  of  interest 
the  scalar  Gaussian  distribution  of  x  is  used. 


fix)  = 


1 


exp 


1 


2(^2' 


(3.17) 


Here,  Equation  (3.7)  and  x'  =  cals,  still  holds.  Substitution  provides  the  distribution  of  A 
when  the  transmitters  are  distributed  following  a  Gaussian  distribution: 


/(A)  = 


ca 


s[2ji(Tr 

1 

ca 

1 

sl^CTf^ 


exp 


exp 


exp 


2  2 
c  a 

2crl‘ 


2(S) 


where  crK  =  —. 

ca 

3. 1.1.4  Close  vs.  Far  Receiver  and  Eavesdropper. 


(3.18) 

(3.19) 

(3.20) 


Equations  (3.1)  and  (3.4)  represent  the  delay  in  seconds  when  the  y  coordinate  is 
considered  to  be  negligible.  In  the  following,  this  assumption  is  removed  and  we  show 
why  this  approach  is  intractable.  Figure  3.3  shows  the  distances  that  are  of  interest.  The 
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Distance  (m) 


Figure  3.3:  Illustration  of  distances  of  interest:  drfi,  dr,  defi,  and  dg 


general  distance  formulas  for  the  four  distances  in  Figure  3.3  are 

dr,0  = 

dr  =  yl(x  -  Xr)^  +  (y  -  yr)^, 

defi  = 

de  =  ^|{x-  Xef  +  (y  -yey-- 

The  delay  in  seconds,  analogous  to  Equations  (3.1)  and  (3.4),  are  defined  here: 

L)\  {dr  drfi')  , 
c 

1)2  —  ide  dg  Q^  . 
c 
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A  is  defined  the  same  way: 


A  —  D2  —  D\. 

Equations  for  x  and  y  are  needed  for  substitution  into  the  general  form  of  Equation  (3.7). 
To  do  this,  the  square  roots  are  removed  by  solving  for  values  that  do  not  depend  on  the 
transmitter  location,  and  defi,  then  squaring  the  resulting  equations.  Eirst,  cA  is  found 
explicitly, 

cA  =  de  -  defi  -  dr  +  drfi.  (3.21) 

Let  p  =  (cA  +  defi  -  drfi).  Removing  all  square  roots  from  Equation  (3.21), 

Unfortunately,  a  fourth  order  polynomial  is  not  solvable  for  x  and  y.  This  approach 
is  analytically  intractable.  However,  simulations  of  this  system  are  designed,  and  are 
presented  in  Section  3.2. 1.3. 

3.1.2  SINK  Derivation. 

In  MIMO  communication  systems  consideration  for  each  of  the  NrNt  transceiver  pairs 
is  needed  because  each  pair  contributes  to  the  overall  IBI.  Eor  this  reason  i(n),  y(n)  and 
q{n)  are  defined  for  each  transmitter  and  receiver  pair. 

L-l 

i=0 

L-l 

ynr,ntiti)  ^  ^)> 

e=o 

dnr,nt(,ti)  inr,nt(ti)  y  nr,ntiP-)  ■ 

The  qnr,nt{n)  samples  are  put  into  a  vector  ({nr,nt  =  [<?nr,nf(0)  qnr,nti^)  ■  ■  ■  qnr,nt{N  -  1)]^, 
and  e^nr,nt  describes  (\nr,nt  in  the  frequency  domain.  However,  the  equalization  is  more 
complicated  in  the  MIMO  system.  Just  as  in  Section  2.3.3  where  a  LS  solution  is  used 
to  estimate  the  channel,  an  LS  approach  is  used  to  equalize  the  Nr  received  symbols  on 
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subcarrier  k,  producing  the  Nt  estimated  transmitted  symbols,  at  each  subcarrier  by 
inverting  the  channel  matrix: 

h  =  KHi)'‘  Hfy,. 

To  characterize  IBI  specifically,  the  channel  is  assumed  to  be  known  for  each  subcarrier. 
Section  2.3.3  considers  a  CE  process  in  an  IBI  inducing  environment.  The  error  in  the 
frequency  domain  is  the  difference  between  the  estimated  symbols  and  the  true  symbols: 

=  ik-\ 

=  +  q^)  - 

=  Xk  +  H'^^qk-Xk 

=  (3.22) 

The  error  vector  calculated  in  Equation  (3.22)  is  an  (Nt  x  1)  vector  representing  the  error 
induced  on  the  system  by  IBI  for  each  estimated  transmit  data  symbol.  Eor  further  analysis 
consider  an  element  of  Qk 

h..  = 

nr=l 

where  To  find  the  PSD  of  the  error,  S  eknXf)  the  auto  correlation  function 

is  needed.  This  is  found  to  be 


=  E  [ent(k)N^i(k  +  m)} 

((Nr  \  '' 

E  j  ^  ^  ^nt,nri,k  Q_nr\,k  ^  ^  ^nt,nr2,{k+m)  ^nr2Xk+m) 
VV«''i  =  l  /  \nr2=l 

Nr  Nr 

^  't  ^  j  {^^nt,nri,k  ^nt,nr2,(k+m)  E  ^nr2,(^+m)}) 

nr[  =  [  «r2=l 
Nr 


nr=l 


(3.23) 

(3.24) 
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The  cross  correlation  term  in  Equation  (3.23)  considers  correlation  between  received 
signals.  In  the  case  where  nr\  =  nr2  the  positive  values  constructively  add.  When  nvi  4^  nr2 
the  complex  values  destructively  add  producing  a  good  approximation  in  Equation  (3.24). 
Einally,  the  PSD  of  the  error  as  a  function  of  subcarrier  and  transmitter  is 


m=-co 
00 

m=-co  nr=\ 

Nr  00 

=  £  \Wnt,nr,kf  Yj  r^Jm) 


jlnfm 


nr=\ 

Nr 


^  j  ^  qnr 


(3.25) 


nr=\ 


S~e;,„,{f)  characterizes  the  noise  as  a  function  of  subcarrier  at  the  receiver  after 
equalization.  The  SINR  per  subcarrier  is  simply  the  ratio  of  signal  power  and  Se^„{f). 
The  performance  in  terms  of  BER  is  related  to  SINR  for  4-QAM  by  [24]: 


Pb  =  Q 


No 


(3.26) 


where  the  SINR  is  the  ratio  of  the  energy  per  bit,  Eb,  and  the  noise  power,  Nq  [24] .  Using  the 
noise  and  interference  derivation  in  Equation  (3.25)  the  BER  can  be  found  per  subcarrier. 
However,  Equation  (3.26)  assumes  that  the  noise  in  the  system  is  Gaussian. 


3.2  Simulations 

In  this  section,  the  theoretical  distribution  of  delays  and  MIMO  SINR  and  BER  are 
confirmed  with  MATEAB  simulations. 

3.2.1  Distribution  of  Delays. 

MATEAB  is  used  to  simulate  the  arrival  times  of  N  transmitters  in  the  topology 
described  in  Section  2.3.1.  Table  3.1  shows  the  parameters  used  for  the  simulations.  In 
the  following  sections  these  parameters  are  used  to  simulate  the  distribution  of  arrival 
times  for  the  Uniform  and  Gaussian  distributed  transmitters.  Section  3. 1.1. 4  demonstrated 
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that  dismissing  the  assumption  of  the  small  angle  approximation  is  not  mathematically 
tractable.  For  this,  two  propagation  distances  are  considered.  The  first,  dp,  is  used  when 
the  small  angle  approximation  applies.  The  distribution  of  arrival  times  is  compared  to  the 
system  that  has  the  receivers  dc  meters  from  the  center  of  the  ROA.  This  issue  is  addressed 
via  simulation  and  conclusions  about  the  differences  in  distributions  are  discussed. 


Table  3.1:  Simulation  parameters  for  Figs.  3. 4-3. 7 


Variable 

Description 

Value 

R 

Radius  of  ROA 

484.8m 

Nt 

Number  of  transmitters 

10,000 

Var.  of  X  (Normal  Distro) 

(f( 

^step 

Increment  of  6  investigated 

7T 

8 

dp 

Far  Rx  distance  from  (0,0) 

Rx  10^ 

dc 

Close  Rx  distance  from  (0,0) 

R  +  O.IR 

3.2. 1.1  Uniform  Distributed  Transmitters. 

Figure  3.4  shows  the  theoretical  and  simulated  distribution  of  relative  arrival  times. 
The  graph  shows  the  distributions  for  6  =  {22.5,  78.75,  180}.  The  simulation  of  relative 
arrival  times  confirms  the  analysis  in  Section  3. 1.1. 2. 

3.2. 1.2  Gaussian  Distributed  Transmitters. 

Figure  3.5  shows  the  distribution  of  A  as  a  function  of  6  similar  to  the  Uniform  case. 
Here,  the  transmitters  are  distributed  normally,  with  zero  mean  and  variance 

3.2.1.3  Close  vs.  Far  Receiver  and  Eavesdropper. 

In  Section  3. 1.1.4  the  distribution  of  the  relative  arrival  times  when  the  receivers 
are  close  enough  to  violate  the  small  angle  approximation  assumption  was  shown  to  be 
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Figure  3.4:  Delay  for  Uniform  transmitter  locations  with  the  Ex  dp  meters  from  the  center 
of  the  ROA. 


intractable.  This  section  provides  results  through  simulation,  for  both  the  Uniform  and 
Gaussian  distributed  transmitters. 

Figure  3.6  shows  this  effect  for  the  Uniformly  distributed  transmitters.  The  Simulation 
line  is  compared  to  the  Theoretical  line  from  Figure  3.4  which  depicts  the  delay  distribution 
under  the /ar  assumption.  A  slight  disagreement  is  more  pronounced  as  6  approaches  180°. 
However,  when  the  transmitters  follow  a  Gaussian  distribution,  the  small  angle  assumption 
does  not  play  as  large  of  a  role.  This  is  because  the  transmitters  further  away  from  the  origin 
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Figure  3.5:  Delay  for  Gaussian  transmitter  loeations  with  the  Ex  meters  from  the  eenter 
of  the  ROA. 


of  the  ROA  play  the  largest  role  in  maximum  delay.  For  the  Gaussian  distributed  loeations, 
there  is  a  lower  ehanee  to  have  a  transmitter  further  away  from  the  origin  eompared  to  the 
Uniform  ease.  This  is  a  funetion  of  the  standard  deviation  whieh  is  set  to  (Tr  =  ^. 

3.2.2  SINK  and  BER  Simulation  Setup. 

In  an  effort  to  eharaeterize  the  performanee  of  a  MIMO  reeeiver  under  effeets  of  IBI 
the  key  parameters  are  investigated  in  this  seetion.  The  first  of  these  is  transmitter  loeation 
in  the  ROA.  Transmitters  near  the  edge  of  the  eirele  induee  greater  delays  than  those 
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Figure  3.6:  Delay  for  Uniform  transmitter  locations  with  the  Ex  dc  meters  from  the  center 
of  the  ROA. 


transmitters  located  near  the  middle  of  the  circle.  This  leads  to  the  second  parameter, 
the  size  of  the  ROA.  The  larger  the  ROA  the  larger  the  delays  in  the  system.  The  third 
parameter  is  the  length  of  the  impulse  response  at  0  =  0°.  Since  the  cooperative  system 
operates  in  this  direction  the  channel  is  not  lengthened  by  delay.  The  cooperative  system 
would  be  designed  to  experience  no  IBI  on  average,  where  Ncp  >  L  -  I  would  hold, 
but  would  want  to  have  IBI  for  a  system  operating  at  6  0°.  If  Ncp  =  L  -  \  then  the 

delay  experienced  at  6  0  would  induce  IBI.  The  last  two  parameters  are  the  number 

of  transmitters  and  angle  at  which  the  receiver  is  located.  As  the  number  of  transmitters 
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Figure  3.7:  Delay  for  Gaussian  transmitter  locations  with  the  Ex  dc  meters  from  the  center 
of  the  ROA. 


increases  the  number  of  channels  increases  which  provides  more  opportunities  for  IBI. 
Finally,  as  6  increases  to  180°  the  system  experiences  larger  differences  in  delay. 

Table  3.2  shows  the  key  parameters  that  influence  the  amount  of  IBI  a  receiver 
experiences.  The  channel  length  and  CP  length  are  set  such  that  one  sample  of  delay 
induces  IBI.  The  number  of  transmitters  is  chosen  to  be  seven  with  one  transmitter  placed 
at  the  origin  of  the  circle  and  the  other  six  transmitters  are  set  equally  spaced  from  each 
other  and  at  the  radius  of  the  ROA.  This  provides  a  worst  case  picture  of  the  trends  of  IBI 
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Table  3.2:  Simulation  parameters  for  Figs.  3.8-3.13 


Variable 

Description 

Value 

Ri 

Radius  of  ROA 

2420  m 

Ri 

Radius  of  ROA 

12.1  km 

L 

Channel  Eength 

17 

Ncp 

Cyclic  Prefix  Eength 

16 

Nt 

Number  of  Transmitters 

7 

Nr 

Number  of  Receivers 

7 

e 

Angle  for  a  Receiver 

O 

o 

1 

00 

o 

o 

Npu 

Num.  pilots  used  for  CE 

19 

without  having  to  average  over  many  transmitter  locations.  The  range  for  6  considered  is 
0°  -  180°  since  180°  -  360°  is  redundant. 

3.2.3  SINK  per  Subcarrier. 

The  signal  power  of  a  4-QAM  symbol  with  constellation  of  {+1  +  /}  is  =  2. 
Equation  (3.25)  represents  the  error  in  the  system.  The  theoretical  SINK  is  the  ratio  of 
„  which  is  considered  as  a  function  of  frequency  in  Figure  3.8.  The  theoretical  SINK 
is  compared  to  MATLAB  simulations  where  the  equalizing  CSI  is  obtained  by  the  known 
CSI,  FDCE  and  TDCE. 

The  error  power  after  equalization  is  calculated  by  finding  the  variance  of  the 
subtraction  of  the  known  transmitted  symbols  from  the  received  signal. 

3.2.4  SINR  Performance. 

To  determine  how  IBI  reduces  expected  performance  of  a  receiver  at  some  angle 
6,  the  average  SINR  across  subcarriers  is  used  as  an  intermediate  metric  to  reach  BER 
performance.  Average  SINR  as  a  function  of  6  is  provided  in  Eigs.  3.9  and  3.10.  Also 
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Normalized  Frequency 


Figure  3.8:  At  an  SNR  of  20  dB  and  6  =  0,  the  theoretical  (Thy)  SINK,  as  function  of 
frequency,  is  compared  to  simulated  SINK  values  where  the  channel  is  equalized  with  the 
known  CSI  (SK),  FDCE  (SF)  and  TDCE  (ST). 


shown,  is  the  trend  as  R  varies,  as  R  increases,  larger  delays  are  experienced  which  results 
in  lower  SINK. 

Figure  3.9  shows  the  performance  of  FDCE  as  a  function  of  6  and  R.  Without  the 
perfect  knowledge  of  the  CSI  the  FDCE  performs  significantly  worse  than  the  theoretical 
performance.  This  degradation  in  performance  is  a  result  of  phase  rotations  in  the  frequency 
domain  that  cannot  be  equalized  because  accurate  CE  cannot  be  obtained  due  to  the 
violation  of  Ncp  >  L  -  1. 
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Figure  3.9:  The  SINK  as  a  function  of  6  and  R  at  an  SNR  of  20  dB.  The  FDCE  (SFi) 
performance  is  compared  to  the  simulated  system  with  known  CSI  (5  Ki)  as  well  as  the 
theoretical  {Thyi)  performance.  Subscripts  correspond  to  R,  in  Table  3.2. 


However,  TDCE  does  have  the  ability  to  correct  for  delays  in  the  time  domain  at  the 
cost  of  computational  complexity.  Eigure  3.10  shows  the  performance  of  the  TDCE  is 
about  a  5  dB  improvement  dX6 

In  Eigs.  3.9  and  3.10  at  0  =  90°  there  is  a  performance  gap  between  the  theoretical 
curve  and  simulated  system  with  known  CSI.  This  effect  is  noticed  when  9  >  0  but  is 
more  pronounced  as  9  increases.  To  show  the  cause  of  this  effect  Eigure  3.11  shows  the 
SINK  performance  as  a  function  of  subcarrier  similar  to  Eigure  3.8  where  9  =  0°,  here 
9  =  90°.  It  is  clear  from  Eigure  3.11  that  all  three  simulated  schemes  have  compromised 
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Figure  3.10:  The  SINK  as  a  function  of  6  and  R  at  an  SNR  of  20  dB.  The  TDCE  (STi) 
performance  is  compared  to  the  simulated  system  with  known  CSI  (5  Ki)  as  well  as  the 
theoretical  {Thyi)  performance.  Subscripts  correspond  to  R,  in  Table  3.2. 


accuracy  compared  to  Figure  3.8  with  the  worst  being  the  FDCE.  At  0  =  90°  there  is  a 
significant  amount  of  IBI  in  the  received  signal.  Even  when  the  CSI  is  known  at  the  receiver 
the  performance  is  degraded  due  to  this  IBI.  In  this  case,  the  signal  is  fundamentally 
compromised  and  cannot  be  equalized  with  the  known  channel.  Then  with  TDCE  the 
performance  is  degraded  even  further  by  estimation  error. 
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Normalized  Frequency 


Figure  3.11:  At  an  SNR  of  20  dB  and  6  =  90°,  the  theoretical  (Thy)  SINK,  as  function  of 
frequency,  is  compared  to  simulated  SINK  values  where  the  channel  is  equalized  with  the 
known  CSI  (SK),  FDCE  (SF)  and  TDCE  (ST). 


3.2.5  BER  Performance. 

The  theoretical  BER  calculated  from  the  SINR  in  Figs.  3.9  and  3.10  is  shown  in 
Figs.  3.12  and  3.13.  The  theoretical  curves  are  calculated  by  converting  the  SINR  at 
each  subcarrier  to  BER,  by  Equation  (3.26),  which  assumes  the  noise  is  Gaussian.  This 
assumption  is  what  accounts  for  the  slight  disagreement  between  the  theory  and  known 
CSI  curves.  The  BER  is  then  averaged  over  subcarrier  and  transmitter  for  a  particular  value 
of  6.  The  result  is  a  BER  performance  for  the  receiver  that  considers  all  the  transmitted 
data.  At  a  0  >  50°  for  R2  the  delay  between  transmitted  waveforms  is  so  large  that  the 
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Figure  3.12:  BER  as  a  function  of  6  and  R  at  an  SNR  of  20  dB.  The  EDGE  (SFi) 
performance  is  compared  to  the  simulated  system  with  known  CSI  (5  Ki)  as  well  as  the 
theoretical  {Thyi)  performance.  Subscripts  correspond  to  R,  in  Table  3.2. 
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communication  link  is  unusable.  This  is  represented  in  Eigs.  3.12  and  3.13  by  the  ramp  up 
to  BER  =  0.5. 


3.3  Conclusions 

In  multi-carrier  communication  systems,  IBI  occurs  if  the  CP  is  shorter  than  the 
impulse  response.  In  MIMO-OEDM  systems  each  impulse  response  could  contribute  to 
the  total  IBI  power.  Generally,  all  channel  lengths  are  the  same  in  MIMO  communication 


48 


Figure  3.13:  BER  as  a  function  of  6  and  R  at  an  SNR  of  20  dB.  The  TDCE  (STi) 
performance  is  compared  to  the  simulated  system  with  known  CSI  (5  Ki)  as  well  as  the 
theoretical  {Thyi)  performance.  Subscripts  correspond  to  R,  in  Table  3.2. 


systems.  However,  topological  delays  induce  longer  impulse  responses  for  distant 
transmitters  with  respect  to  close  transmitters.  Derived  in  this  chapter  is  the  theoretical 
SINR  for  such  a  system  with  a  fixed  set  of  delays.  The  expected  performance  in  terms  of 
BER  is  shown  to  agree  with  simulations  of  the  system. 

In  a  cooperative  system  design  the  6  =  0  direction  is  considered.  The  EDGE  and 
TDCE  are  favorable  CE  techniques  for  Rx.  With  these  designs  the  systems  perform 
favorably  with  poor  performance  for  the  Ex,  making  it  less  likely  to  demodulate  data  not 
intended  for  it,  making  these  processes  a  viable  physical  layer  security  technique.  Eor 
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example,  even  with  a  known  channel,  Ex  at  35°  experiences  a  10  -  15  dB  drop  in  SINK  and 
a  lOx  increase  in  BER  for  our  simulation  scenario. 
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IV.  Implementation  of  Multi-User  NC-OFDM  TOA  Estimation  Algorithms 


The  performance  of  an  Ex  is  improved  if  the  delays  between  the  Nt  users  are  known 
making  the  TDCE  usable.  This  chapter  discusses  estimating  these  delays  between  users 
using  an  OEDMA  waveform.  Two  algorithms  are  considered.  The  first  algorithm  assumes 
each  user  occupies  all  N  subcarriers.  Since  this  is  not  the  case  in  OEDMA  the  performance 
of  this  estimate  is  degraded  compared  to  the  second  algorithm.  In  the  second  algorithm 
the  correlation  induced  by  a  user  using  a  subset  of  the  available  subcarriers  is  leveraged  to 
determine  the  delay  between  the  Nt  users  and  assign  the  delays  to  each  user.  Simulation 
results  for  the  two  algorithms  are  provided.  A  small  scale  hardware  implementation  is 
then  outlined  and  results  are  provided.  This  chapter  ends  with  a  discussion  of  a  large 
scale  implementation  that  is  set  up  for  future  work  with  the  goal  of  fixing  synchronization 
concerns  between  all  receivers. 


4.1  Estimators 

If  the  number  of  OEDM  symbols,  Nb  >  I,  then  y„b  represents  the  received  OEDM 
symbol,  in  which  case  all  Nt  OEDM  symbols  are  used  to  estimate  the  delay.  The  Appendix 
derives  the  estimator  under  this  condition  and  the  result  is  provided  here  [26]: 

/  Nb  ^ 

2' 


d  =  argmin 

d 


Nb  . 

v^iogiCdi  +  Yj  :^ylc~d^ynb 


nb=[ 


(4.1) 


where  Cd  is  defined  in  Equation  (2.21).  Equation  (2.22)  is  the  cost  function  of  the  van  de 
Beek  method  which  is  repeated  here  [27] 


J  =  argmax  ||7(J)| -pO(J)|  (4.2) 

d 

where 

s+Ncp-l 

y(s)  =  E  y(m)y*(m  +  N),  (4.3) 

m=s 
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the  term  is  assumed  to  be  zero  in  these  simulations  and  experiments.  In  the 

next  three  sections,  the  van  de  Beek  and  Acharya  methods  are  compared  in  MATLAB 
simulations  and  in  an  implementation  using  the  WARP  boards. 

4.2  Simulation 


Table  4.1:  Simulation  parameters  for  Figs.  4. la-4. lb 


Variable 

Description 

Value 

N 

Number  of  Subcarriers 

256 

Nb 

Number  of  OFDM  Blocks 

1 

Ncp 

CP  Length 

16 

N, 

Number  of  users 

4 

M 

Number  of  Monte-Carlo  Trials 

1,000 

N' 

OFDMA  Block  Length 

272 

d 

Propagation  Delay 

[20  40  60  80] 

SNR 

Signal  to  Noise  Ratio 

5dB 

This  section  demonstrates  the  accuracy  of  the  van  de  Beek  method  [27]  and  Acharya 
method  [26]  via  MATLAB  simulation.  The  parameters  for  the  simulations  are  provided 
in  Table  4.1.  The  peaks  in  Figure  4.1a  correspond  to  d.  The  estimates  are  in  the  range  of 
-136  to  136  which  correspond  to  +0.5A'. 

Each  of  the  Nt  =  4-  users  are  allocated  subcarriers  in  which  to  transmit  data  on.  The 
subcarrier  assignment  scheme  used  is  the  same  as  in  Figure  3  in  [26]  where  each  user  is 
allotted  a  fourth  of  the  subcarriers  in  four  equally  spaced  dis-contiguous  sub-bands.  As 
[26]  states,  under  dynamic  allocation  of  subcarriers  this  scheme  is  a  possible  scenario. 
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TOA  Estimate  (Samples)  TOA  Estimate  (Samples) 

(a)  van  de  Beek  method  (b)  Acharya  method 

Figure  4.1:  MATLAB  simulation  of  the  TOA  estimation  algorithms  with  system  parameters 
provided  in  Table  4.1. 


Figure  4.1b  shows  the  performance  of  Equation  (4.1)  which  assumes  knowledge  of 
which  subcarriers  are  used  by  each  user.  In  this  work  it  is  assumed  that  the  subcarrier 
assignments  are  known  at  Ex.  With  the  knowledge  of  the  subcarriers  used  by  each  user 
the  TOA  is  determined  for  each  user.  The  accuracy  is  improved  with  the  Acharya  method 
because  the  correlation  in  the  time  domain  is  modeled  in  the  estimator. 

4.3  Small-Scale  Hardware  Implementation 

In  this  section  the  implementation  of  two  TOA  algorithms  is  compared  in  terms  of 
accuracy  and  execution  speed.  Eirst,  the  WARP  board  test  bed  used  in  this  experiment  is 
outlined.  Then  the  van  de  Beek  method  is  implemented  and  studied  with  multiple  OEDMA 
symbols  used  at  the  receiver  to  estimate  the  TOA.  Einally,  the  Acharya  method  is  also 
implemented  on  the  same  platform  and  its  accuracy  and  computational  complexity  are 
compared. 
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4.3.1  WARP  Board  Test-bed. 


Recall  that  the  delay  experienced  at  the  receiver  is  made  up  of  two  components 
<pnt  =  trx  -  Tnt  whcrc  trx  dcnotcs  the  local  clock  at  the  receiver  and  Tm  is  the  local  clock  at 
the  transmitter.  is  then  the  propagation  delay  induced  by  the  path  length.  The  goal  in 
this  TOA  study  is  to  estimate  Ant,  however  with  (pnt  playing  a  role  in  the  true  timing  estimate 
a  repeatable  experiment  is  not  easily  attainable. 

For  the  test-bed  to  have  a  repeatable  experiment  a  single  WARP  board  is  used  with 
four  transmit  antennas.  Each  antenna  is  considered  one  user.  In  this  scenario  the  four 
users  have  the  same  local  clock  making  (pnt  constant  across  rit  henceforth  denoted  cp.  The 
receiver  estimates  the  A,  =  4  user  TOA  values.  The  relative  delay  between  each  user  is 
representative  of  Ant  with  an  overall  delay  corresponding  to  (p. 

The  transmitter  and  receiver  WARP  boards  are  controlled  by  MATLAB  via  an  Ethernet 
switch.  The  Ethernet  cables  between  the  controller  computer  and  the  two  WARP  boards  are 
used  for  rough  synchronization.  At  the  sampling  frequency  of  the  WARP  boards,  fs  =  40 
MHz,  the  Ethernet  synchronization  is  reliable  to  +100  samples  in  the  received  signal. 

The  Ethernet  cable  length  constricts  the  physical  size  of  the  test-bed;  Ethernet  cables 
are  100  feet  in  length.  Eor  this  reason,  the  delay  between  transmitted  waveforms  is 
artificially  added  by  prepending  zeros  to  the  transmitted  waveform  in  MATEAB  before 
transmission.  The  number  of  zeros  prepended  corresponds  to  Ant  and  is  what  is  to  be 
estimated  by  the  van  de  Beek  and  Acharya  methods.  The  experiment  parameters  used  are 
provided  in  Table  4.2. 

4.3.2  van  de  Beek  Method. 

Eigure  4.2  shows  a  histogram  of  3  where  the  peaks  at  [26,  156,  70,  114]  correspond 
to  d  +  ^  where  (p  =  20.  Eigure  4.2  also  shows  the  estimator  accuracy  as  a  function  of  the 
number  of  OEDM  symbols  used  for  the  estimate.  The  cost  function,  y(s),  associated  with 
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Table  4.2:  Experiment  parameters  for  Figs.  4. 2-4. 3 


Variable 

Description 

Value 

N 

Number  of  Subcarriers 

64 

Nb 

Number  of  OFDM  Blocks 

6 

Ncp 

CP  Fength 

16 

N 

Number  of  Monte-Carlo  Trials 

2,000 

d 

Propagation  Delay 

[6.25,  135.625,  50,  93.75] 

each  OFDM  symbol  is  averaged  then  the  maxima  are  found.  As  the  number  of  OFDM 
symbols  increases  the  effect  of  noise  is  reduced  due  to  the  averaging  which  results  in  more 
reliable  estimates.  Nb  =  6  OFDM  symbols  are  used  in  Figure  4.2 

An  issue  with  this  method  is  that  there  is  not  a  way  to  discern  users.  The  values  of  y(s) 
do  not  relay  this  information.  This  issue  is  problematic  in  estimating  the  timing  delays  for 
the  Ex  to  equalize  the  users  in  the  ROA.  Next,  the  Acharya  Method  does  allow  each  user’s 
TOA  to  be  calculated. 

4.3.3  Acharya  Method. 

The  Acharya  method  assumes  the  subcarriers  used  by  each  user  are  known.  This 
information  is  captured  in  the  construction  of  as  a  function  of  nt.  Each  user  is  allocated 
subcarriers  for  which  to  transmit  their  data.  The  subcarriers  that  are  not  used  by  a  specific 
user  are  zeroed,  which  introduces  correlation  between  time  domain  samples  that  degrades 
the  van  de  Beek  method’s  performance. 

The  Acharya  method  is  more  accurate  compared  to  the  van  de  Beek  method.  The 
Root-Mean  Squared  Error  (RMSE)  for  the  Acharya  method  is  20.08  and  for  the  van  de 
Beek  method  the  RMSE  is  38.71.  This  is  shown  in  Figure  4.3  where  each  user  is  separated 
out  and  an  estimate  histogram  is  presented  for  Nmc  =  2000  monte-carlo  trials. 


55 


Figure  4.2:  Histogram  of  TOA  estimates  for  the  van  de  Beek  method.  Peaks  correspond  to 
the  four  users  in  the  system.  The  number  of  OFDM  symbols  used  in  the  averaging  of  the 
cost  function  are  also  plotted.  As  the  number  of  OFDM  blocks  increase  a  more  accurate 
estimate  is  obtained.  The  RMSE  for  each  user  is  38.71. 


4.4  Large-Scale  Hardware  Implementation 

Thus  far,  the  application  for  the  TOA  estimates  were  to  determine  the  delays  between 
the  received  waveforms  for  Ex  to  improve  its  fidelity.  A  small  extension  of  this  concept  is 
TDOA  position  estimation.  However,  a  large  issue  with  TDOA  is  synchronization  between 
the  Nr  receive  antennas.  This  is  needed  for  a  time  of  reference  from  which  the  time 
difference  is  taken. 
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(c)  User  3  TOA  estimate  histogram 
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(d)  User  4  TOA  estimate  histogram 


Figure  4.3:  TOA  estimates  according  to  the  Acharya  method  for  each  user.  The  RMSE  for 
each  user  is  20.08. 


The  small-scale  implementation  solved  this  issue  by  having  all  the  users  synchronized 
using  the  one  Local  Oscillator  (LO)  on  the  WARP  board.  The  WARP  board  itself  is 
designed  for  work  in  MIMO  communications  where  the  antennas  are  synchronized.  This 
platform  lends  itself  to  TDOA  estimation.  However,  in  a  large  scale  implementation  the 
receivers  need  to  be  synchronized  over  hundreds  of  feet. 

To  synchronize  the  Nr  =  4-  receivers  in  our  testbed  a  reference  transmitter  (denoted 
TXre/)  is  used.  The  position  of  this  transmitter  is  known  along  with  the  positions  of  each  of 
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the  receivers  as  depicted  in  Figure  4.4.  The  delays  between  to  each  receiver  is  known 
denoted  =  [A^j  Ar2  •  •  •  Each  transmitter  has  a  local  time  associated  with  its  LO 

the  local  time  is  denoted  tref  for  Tx^/  and  ttx  for  the  transmitter  to  be  located  Tx^kn-  The 
local  time  for  each  receiver  is  denoted  by  Tnr  for  nr  £  {I,  2, ,  Nr). 


Rxl 


Rx4 


Rx2 


Rx3 


Figure  4.4:  Illustration  of  the  four  receivers  used  to  determine  TDOA  measurements  for 
the  unknown  transmitter,  Tx^kn-  Also  depicted  is  the  reference  transmitter,  Tx^kn,  used  to 
synchronize  the  four  receivers. 


At  the  nF^  receiver  the  TOA  is  found  for  both  the  reference  transmitter,  tnr.ref,  and 
the  unknown  transmitter,  t^r,  which  are  assumed  to  be  operating  in  two  different  frequency 
bands.  The  TOAs  for  both  transmitters  are  modeled  as 


^nr  hx  ^u,nr 
^nr.ref  Fe/  ^r.nr  "^nr-) 
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where  A„  denotes  the  unknown  propagation  delay  from  the  unknown  transmitter  to  the 
nr'^’  receiver. 

To  determine  the  TDOA  between  the  nr‘l‘  and  nr^*  receivers  consider  the  following 
equations: 


tjiri  — tfx  “I"  ^u,nr\  “I"  '^nr\^ 

tnr2  — hx  “I"  ^u,nr2  '^nr2^ 

tnr\,ref  — ^ref  ^r,nr\ 

tnr2,ref  — ^ref  ^r,nr2  ^nr2- 


(4.4) 


The  values  4^,,  f„ri,re/  and  f„r2,re/  are  estimated  at  each  receiver  and  are  provided 
to  a  central  processing  center  for  TDOA  estimation.  At  the  processing  center,  m  = 
[nil,  fni,  ■  ■  ■ , is  defined  where 


^nr  — ^nr  ^nr,ref  "1“  ^r,nr 
— hx  -  tref  +  A  u,nr 


TDOA^j^j  ,j,-2  nifif^  ^nr2 


~^u,nri  ^u,nr2 


(4.5) 


To  test  the  accuracy  of  this  method  the  WARP  boards  where  set  up  in  a  hallway.  The 
hallway  itself  is  150  feet  long.  The  four  receivers  are  set  equally  spaced  in  the  hallway. 
Figure  4.5  illustrates  how  the  receivers  are  spaced  in  the  hallway.  The  unknown  transmitter 
and  reference  transmitter  are  co-located  at  one  end  of  the  hall. 


This  test  is  then  ran  to  ensure  that  the  desired  TDOA  values  are  obtained.  First  the 
expected  values  are  calculated.  The  speed  of  light  c  =  9.8  x  10^  ft/s  and  the  sampling 
frequency  fs  =  40  MHz  are  needed.  If  the  path  length  of  a  transmitted  signal  to  two  WARP 
boards  differs  by  cj fs  =  24.5  ft  the  resulting  TDOA  between  those  two  WARP  boards  is  1 
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Tx,ukn 

Tx,ref 

Rxl  Rx2  Rx3  Rx4 

H - 1 - 1 - 1 - 

Oft  50  ft  100  ft  150ft 

Figure  4.5:  Reference  transmitter,  unknown  transmitter  and  the  four  receiver  locations  in  a 
hallway  to  verify  the  TDOA  measurement  accuracy. 


sample.  For  the  test  depicted  in  Figure  4.5  each  WARP  board  has  a  two  sample  difference 
from  an  adjacent  receiver. 

Figure  4.6  shows  the  TDOA  estimates  for  1000  trials.  The  van  de  Beek  method  is  used 
where  one  user  transmits  on  all  subcarriers.  In  each  case  the  expected  number  of  samples 
+  1  is  estimated  the  majority  of  the  time. 

Each  TDOA  estimate  is  negative.  The  reason  for  this  is  that  the  first  WARP  board 
receiver  is  at  the  same  location  as  the  transmitters.  T\\q  first  receiver  TOA  estimate  is  used 
as  the  reference;  from  which  the  other  TOA  values  are  subtracted  from  in  Equation  (4.5). 
As  a  result,  the  TDOA  values  are  negative  because  the  signal  is  received  after  the  first 
receiver. 

The  hallway  test  is  designed  to  verify  the  use  of  Equation  (4.4)  to  Equation  (4.5). 
However,  the  test  is  not  interesting  in  terms  of  locating  the  unknown  transmitter.  Eor  more 
interesting  topologies,  as  shown  in  Eigure  4.4,  the  physical  limitation  of  Ethernet  length  is 
addressed  to  provide  a  larger  test-bed. 

To  obtain  a  larger  test-bed,  the  wired  connection  between  the  Ethernet  switch  and 
each  receive  WARP  board  is  converted  to  a  wireless  connection  via  an  Ethernet  bridge 
[41].  The  wired  connection  speed  is  a  Gigabit  per  second.  The  wireless  communication 
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(a)  TDOA  mi  -  m2  where  the  true  delay  is  -2.  (b)  TDOA  my  -  m3  where  the  true  delay  is  -4. 
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(c)  TDOA  nil  -  ni4  where  the  true  delay  is  -6. 
Figure  4.6:  Hallway  Test 


connection  does  not  have  this  capability.  An  Ethernet  switch  is  used  to  convert  between 
Gigabit  Ethernet  to  100  Mb/s  connection  speeds. 

4.5  Conclusions 

This  chapter  discussed  the  feasibility  of  obtaining  TOA  estimates  with  an  application 
for  an  Ex  to  reduce  interference  and  obtain  a  reliable  estimate  of  the  transmitted  signal.  The 
application  of  using  the  TOA  estimates  in  TDOA  position  estimation  is  also  considered. 

The  two  methods  for  estimating  the  TOA,  van  de  Beek  and  Acharya,  were  considered. 
Simulations  of  the  methods  showed  positive  results  for  both  algorithms  however  the 
Acharya  method  is  more  accurate  compared  to  the  van  de  Beek  method.  This  is  a  result 
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induced  by  the  time  domain  correlation  introduced  when  users  occupy  a  subset  of  the 
available  subcarriers. 

Two  implementations  followed  the  simulation  results.  A  small  scale  implementation 
showed  the  performance  of  the  van  de  Beek  and  Acharya  methods  where  the  A,  =  4  users 
shared  an  LO.  This  small  scale  implementation  did  not  attempt  to  solve  the  synchronization 
issue  between  receivers  since  only  one  receiver  is  used  and  the  physical  limitations  of  the 
WARP  board  test-bed  were  not  an  issue  since  the  delays  between  users  were  artificially 
added  to  the  transmitted  signal. 

Preliminary  work  on  the  large  scale  implementation  showed  progress  in  the 
synchronization  between  receivers  with  the  use  of  a  reference  transmitter.  The  physical 
limitation  of  the  WARP  board  implementation  is  also  circumvented  by  the  use  of  an 
Ethernet  bridge  which  removed  the  wired  connection  between  the  receiving  WARP  board 
and  the  controlling  computer. 
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V.  FPGA  Resource  Utilization  for  MIMO-OFDM  Receivers 


Rx  is  designed  in  this  ehapter  based  on  the  results  found  in  Seetion  3.2.4.  The  FDCE 
method  is  employed  due  to  its  added  physieal  layer  seeurity  attributes.  However,  in  this 
chapter  maximizing  the  data  rate  is  used  as  the  design  criteria.  By  increases  the  size  of  the 
MIMO  system  data  rate  is  increased  but  to  implement  the  MIMO  system  the  amount  of 
resources  increases.  This  chapter  shows  the  trends  of  the  resource  usage  as  a  function  of 
the  size  of  the  MIMO  system. 

The  goal  of  this  chapter  is  not  to  optimally  determine  the  correct  balance  between 
resource  usage,  latency  and  BER  for  the  entire  MIMO  receiver  design.  The  goal  is  to 
show  how  resource  usage  and  data  rate  scale  as  a  function  of  the  size  of  the  MIMO  system 
deployed. 

This  chapter  first  breaks  down  the  MIMO  receiver  into  smaller  calculation  blocks. 
These  blocks  are  VHDE  modules  that  are  analyzed  in  the  design.  Eor  each  block,  the 
number  of  slice  registers,  slice  Eook-Up  Tables  (EUTs),  and  Digital  Signal  Processing 
(DSP)  resource  blocks  are  reported.  With  this  information,  some  intuition  is  gained 
regarding  the  resource  hungry  blocks  and  some  ways  to  reduce  the  resource  usage  are 
discussed. 

The  MIMO  OEDM  receiver  is  then  designed  for  Na  transmit  and  Na  receive  antennas 
withAa  6  {2,  3,  4}.  A  comparison  between  resource  utilization  shows  how  the  blocks  scale 
as  a  function  of  the  number  of  antennas  in  the  system.  Prom  here,  extrapolation  is  used  to 
draw  a  relation  between  desired  throughput  of  a  communication  link  and  the  number  of 
resources  required  to  implement  the  MIMO  receiver  with  a  target  PPGA  of  the  VX980T. 
Also  considered  are  the  VX1140T  and  VX2000T  PPGAs.  If  the  desired  throughput  is 
more  ambitious  than  the  largest  PPGA  can  fit  we  discuss  some  architecture  possibilities 
to  alleviate  the  resource  constraint.  This  requires  designing  a  board  with  multiple  PPGAs 


63 


Figure  5.1:  Block  diagram  of  a  MIMO  receiver  that  utilizes  frequency  separated  pilot  tones. 


and  considering  data  rates  between  FPGAs  to  effectively  double  or  triple  available  resource 
counts. 

5.1  MIMO  Receiver  Architecture 

The  MIMO  receiver  is  decomposed  into  eight  calculation  blocks.  These  blocks 
consider  functions  such  as  the  FFT,  matrix  vector  multiplication,  matrix  inversion  and 
also  a  block  that  routes  data  and  pilot  samples  to  the  correct  locations.  The  nine  blocks 
pictured  in  Figure  5.1  represent  the  full  MIMO  receiver  where  the  FFT  block  is  needed  in 
two  locations.  The  functional  description  of  these  blocks  starts  with  the  matched  filter  that 
synchronizes  the  receiver  with  the  transmitted  preamble. 

5.1.1  Correlation. 

The  correlation  process,  Block  1,  provides  three  features.  The  first  is  the  matched 
filter  that  is  used  to  synchronize  the  receiver.  Detection  of  the  peak  of  the  output  of  the 
matched  filter  signifies  the  preamble  has  been  received  and  data  demodulation  can  begin. 
The  second  and  third  features  are  a  Low  Pass  Filter  (LPF)  and  the  downsampling  process. 
The  LPF  avoids  aliasing  and  for  our  studies  a  32-tap  Square-Root  Raised  Cosine  (SRRC) 
filter  is  used  at  an  oversampling  factor  of  8. 

A  matched  filter  is  used  at  each  of  the  Na  receive  antennas.  A  multichannel  transposed 
Finite  Impulse  Response  (FIR)  filter  architecture  is  chosen  for  its  resource  reuse  properties 
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[49].  Each  channel’s  data  rate  is  two  times  the  sampling  frequency,  /^.  The  multichannel 
design  allows  this  block  to  scale  very  well  as  Na  increases.  As  Na  increases,  the  calculation 
speed  of  the  filter,  Nafs,  is  increased  by  a  factor  of  Na-  This  is  a  valid  approach  until  the 
calculation  clock  frequency  exceeds  the  resource  switching  speed  of  the  FPGA. 

5.1.2  Fast  Fourier  Transform. 

The  EFT  algorithm  is  used  in  two  locations,  the  first  of  which  {Block  2)  is  after 
synchronization  is  accomplished.  The  Na  received  sample  chains  are  divided  up  into  blocks 
of  N  subcarriers  excluding  the  CP  and  are  transformed  to  the  frequency  domain.  The 
second  location  the  EFT  algorithm  is  used  is  after  the  impulse  responses  are  estimated. 
Block  6.  The  total  number  of  EFT  calculations  needed  are  Na  in  Block  2  and  N^  in  Block  6. 

The  Xilinx  Core  Generator  provides  a  core  that  calculates  the  EFT  algorithm  using 
Block  RAM  and  DSP48E1  blocks.  The  option  is  also  provided  to  instantiate  the  EFT 
in  logic,  which  is  beneficial  when  many  DSP  operations  are  being  considered  with  limited 
DSP48E1  blocks  available  on  the  FPGA  [50].  The  tradeoff  between  using  DSP48E1  blocks 
and  slice  logic  is  discussed  in  Section  5.2. 

5.1.3  Store  Samples. 

Pilot  and  data  tones  are  defined  in  the  frequency  domain;  as  the  FFT  blocks  calculate 
and  push  out  frequency  domain  samples,  pilots  and  data  samples  are  separated  in  Block  3. 
The  pilot  samples  are  stored  and  routed  to  the  CE  processes.  The  data  samples  are  stored 
and  await  equalization. 

Consideration  is  given  to  storing  the  samples  into  Block  RAM.  This  ensures  slice 
registers  are  not  occupied  with  storage  responsibilities  when  Block  RAM  is  available.  The 
architecture  of  the  Virtex-7  Series  allows  dual-port  Block  RAM  access  [51].  For  Block 
RAM  to  be  utilized,  two  or  less  accesses  to  a  single  Block  RAM  is  allowed  in  one  clock 
cycle.  To  use  Block  RAM,  resources  are  used  wisely,  but  if  resources  are  not  constrained 
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or  latency  is  a  tighter  constraint  LUTs  can  provide  a  way  to  access  more  than  two  samples 
in  one  clock  cycle. 

5.1.4  Frequency  Response  Estimation. 

Recall  the  pilot  tones  are  frequency  separated  and  the  pilot  samples  that  were 
transmitted  are  known  at  the  receiver.  The  frequency  response  estimation  block,  Block 
4,  divides  the  received  pilot  sample,  ynr(k),  by  the  known  transmitted  pilot,  Xnt{k).  The 
Xilinx  divider  core  is  instantiated  to  perform  this  calculation.  This  core  has  the  ability  to 
utilize  slice  logic  or  DSP48Els  depending  on  resource  utilization  requirements.  As  the 
number  of  receivers  increases  the  number  of  divider  cores  also  increases  to  reduce  latency. 

5.1.5  DFT  Matrix  Interpolation. 

The  result  of  the  frequency  response  estimation  block  is  vectors  each  having  L 
samples  representing  frequency  responses  at  particular  (non-contiguous)  subcarriers.  The 
DFT  matrix  interpolation  block.  Block  5,  uses  the  L  samples  to  interpolate  the  full  N 
samples  of  the  frequency  response. 

To  do  this  calculation,  a  pseudo-inverse  of  a  submatrix  of  the  DFT  is  used.  Since  the 
pilot  tone  subcarriers  are  known  a  priori,  the  submatrix  and  its  inverse  are  calculated  off 
line.  Tn  denotes  the  full  DFT  matrix.  The  submatrix  'Wnr,nt  is  then  calculated  by 

"Wnr,nt  =  Tn  L,  knr,nt)  (5.1) 


where  knr,nt  denote  the  subcarriers  used  as  a  function  of  the  transmitter  and  receiver.  The 
task  of  this  calculation  block  is  to  simply  multiply  the  calculated  inverse  times  each  hnr,nt 
to  estimate  each  impulse  response. 


t  £1 

nr,nP'-nr,nt 


(5.2) 


where  =  {^nr,nt^nr,n^  '^nr,nf  Thc  total  numbcr  of  matrix-vector  multiplications 

calculated  by  this  block  is  N^. 
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The  result  of  one  matrix- veetor  multiplieation  is  an  estimate  of  the  impulse  response 
between  the  and  the  nr‘^  pair  denoted  by  This  veetor  is  then  pushed  into  an 

FFT  bloek,  Block  6,  outlined  in  Seetion  5.1.2  whieh  results  in  the  full  estimated  frequeney 
response. 


5.1.6  Channel  Matrix  Inversion. 

The  arehiteeture  and  timing  of  the  FFT  bloek  provides  a  sample  every  eloek  eyele 
onee  the  FFT  is  ealeulated.  This  is  leveraged  by  instantiating  FFT  bloeks.  Eaeh  of  them 
is  started  at  the  same  time  and  onee  the  ealeulation  is  eomplete  samples  are  provided  at 
eaeh  eloek  eyele.  During  the  first  eloek  eyele  that  data  is  valid  out  of  the  FFT  bloeks  the 
NI  samples  are  reshaped  into  matrix  H^=i  6 

To  invert  the  eomplex  the  QR  Deeomposition  (QRD)  takes  in  a  real  matrix  of  the 
form: 


Re{H^}  -  Im{H;t} 


H,= 


(5.3) 


[lm{H,}  Re{H,}  J 

where  H^.  e  ]^2v„x2v<,  QRD  ealeulates  INa  iterations  [52].  The  result  is  then  a  unitary 
matrix  Q^.  and  an  upper  triangular  matrix  R^. 

This  proeess  is  eomputationally  intensive  and  needs  to  be  ealeulated  N  -  NaL  times. 
It  is  a  waste  of  resourees  to  instantiate  N  -  NaL  QRD  bloeks.  This  is  espeeially  true  sinee 
the  FFT  bloeks  are  providing  a  new  matrix  to  deeompose  every  eloek  eyele,  exeluding  pilot 
tones.  At  eaeh  eloek  eyele  the  FFT  bloeks  eaeh  provide  a  sample  to  populate  the  matrix. 
The  data  are  not  available  all  at  onee.  Said  another  way,  the  matrix  assoeiated  with  k  =  10 
does  not  have  to  be  known  before  the  matrix  assoeiated  with  k  =  1  is  deeomposed.  The 
ability  to  use  the  time  between  matrix  arrivals  for  ealeulation  along  with  a  higher  eloek 
frequeney  for  QRD  ealeulations  provide  great  resouree  savings  [53]. 

5.1.7  Equalization. 

The  result  of  this  bloek  are  the  equalized  data  symbols  xu  where  yk  =  The  QRD 
bloek  deeomposed  H*.  sueh  that  =  Q^R^;.  Substitution  then  provides  yk  =  (Q*:R/:)x,t. 
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The  equalization  process  consists  of  two  stages.  The  first  is  matrix-vector  multiplication 
and  the  second  is  backwards  substitution,  both  of  which  are  contained  in  Block  8. 

The  first  stage  calculates  the  matrix-vector  multiplication  corresponding  to  =  Q^y/t. 
Since  Q*-  is  unitary  its  inverse  is  its  transpose  (since  Q^.  is  real),  which  is  considered  with 
the  format  defined  in  Equation  (5.3). 

The  second  stage  leverages  the  upper-triangular  structure  of  and  after  the 
calculations  of  stage  one  we  have  b/t  =  RkXt.  The  last  row  then  represents  one  equation  and 
one  unknown  which  can  be  solved  for.  The  second  to  last  row  now  has  one  unknown  since 
the  last  value  of  x  is  known.  This  process  is  repeated  2Na  times.  Once  this  is  complete  the 
equalized  symbols  have  been  calculated. 

The  implementation  of  this  algorithm  is  straightforward,  however  there  are  many 
divide  operations  needed  which  on  an  FPGA  is  resource  intensive.  The  use  of  DSP48Els 
for  the  divide  operation  is  available  but  depending  on  the  number  of  divides  needed  the 
number  of  available  DSP48Els  may  run  low. 

5.1.8  Map  to  Bits. 

After  equalization  the  estimated  symbols  are  mapped  to  bits.  Since  M-ary-QAM  is 
used  with  Af  =  4  at  the  transmitter,  the  map  to  bits  algorithm  is  just  the  sign  bits  from 
the  real  and  imaginary  components  of  the  complex  symbol.  The  amount  of  resources  this 
logic,  in  Block  9,  uses  is  negligible  compared  to  the  QRD  and  backwards  substitution. 

5.2  Resource  Use  Measurements 

To  gain  intuition  on  the  number  of  resources  needed  as  a  function  of  Na,  a  VESI 
implementation  is  carried  out  for  6  [2  3  4] .  Extrapolation  is  used  to  determine  trends  as 
a  function  of  the  number  of  QRD  blocks  instantiated  Nqr  and  Na.  This  section  discusses 
how  the  receiver  design  leverages  a  pipelined  architecture  to  improve  resource  usage  and 
increase  the  data  rate.  Results  are  reported  for  resource  usage  per  component  as  a  function 
of  A,. 
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5.2.1  Pipelined  Latency. 

The  MIMO  receiver  is  developed  as  a  series  of  calculation  blocks.  Each  calculation 
block  performs  its  particular  task  and  provides  its  outputs  to  the  next  block.  In  designing 
each  block,  consideration  is  taken  to  ensure  that  the  block  is  able  to  handle  the  assumption 
that  the  Analog  to  Digital  Converters  (ADCs)  are  constantly  taking  data. 

In  Figure  5.2  the  ADC  Sample  column  represents  the  sampling  of  data.  Each  numbered 
block  of  data  in  Figure  5.2  represents  one  OFDM  symbol  with  the  CP  removed.  Each 
OFDM  symbol  is  then  N  samples  long.  Once  the  receiver  is  synchronized  the  samples  are 
fed  directly  into  the  EFT.  The  relation  between  the  ADC  Sample  column  and  the  FFT(in) 
column  is  that  as  data  are  being  sampled  there  is  no  delay  to  provide  the  samples  as  input 
to  the  FFT. 

The  next  column,  FFT  Calc,  represents  the  pipelined  architecture  of  the  Xilinx  core. 
Each  block  in  this  column  lists  the  OFDM  symbols  that  are  currently  being  calculated  by 
the  core.  Up  to  three  OFDM  symbols  can  be  calculating  in  the  core.  Once  the  FFT  core 
is  completed  with  converting  the  time  domain  signal  to  the  frequency  domain  the  samples 
are  output  serially  as  represented  in  the  FFT(out)  column. 

The  FFT(out)  samples  are  provided  to  the  store  samples  block,  which  is  used  to 
delineate  between  data  symbols  and  pilot  symbols.  The  input  to  this  block,  column  Store 
Samples(in),  stores  the  samples,  assigns  the  samples  in  column  Assign  Pilot/Data  Syms 
then  outputs  the  pilots.  Pilots  Out  and  outputs  the  data  symbols  Data  Syms  Out. 

The  Pilot  Out  column  provides  the  inputs  to  the  impulse  response  estimator.  The  pilot 
tones  are  used  to  estimate  each  impulse  response.  The  output  of  the  process  is  shown  in 
column  h  est(out).  Since  the  estimated  impulse  response  needs  to  be  zero  padded  for  the 
FFT  algorithm,  FFT  (in)  starts  as  h  est(out)  starts  but  also  provides  the  N  -  L  zeros  needed 
to  complete  the  N  input  samples  for  the  FFT  block. 
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Figure  5.2:  OFDM  symbol  cycle  diagram  showing  the  flow  of  OFDM  symbols  as  numbered 
blocks.  The  N  samples  in  each  numbered  block  pass  through  the  stages  of  calculation  in 
the  MIMO  receiver.  Some  calculations  have  a  higher  latency  (e.g.  FFT)  or  low  latency 
where  samples  are  passed  into  a  calculation  while  sampling. 


Once  again  FFT  Calc  and  FFT  (out)  columns  provide  timing  delays  similar  to  the 
previous  FFT  cores.  The  result  of  the  FFT  is  then  provided  to  the  QRD  block,  QR(in). 
After  Nqr  =  11  clock  cycles  at  the  data  rate  clock  speed  the  first  QRD  is  complete.  A  more 
detailed  timing  diagram  of  how  the  QRD  cores  are  scheduled  is  provided  in  Figure  5.3. 

The  outputs  of  the  QRD  block  QR(out)  are  the  inputs  to  the  equalize  block.  Column 
Equalize(in)  not  only  considers  the  inputs  from  the  QRD  but  also  from  the  Data  Syms  Out 
column  because  these  are  the  received  samples  that  are  going  to  be  equalized.  It  is  shown 
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that  the  store  samples  block  provides  the  data  symbols  needed  by  the  equalization  block 
when  the  CE  is  calculated.  A  First-In  First-Out  Memory  Block  (FIFO)  used  to  store  the 
data  symbols  until  they  are  needed. 

Once  the  QRD  data  is  provided  to  the  equalize  block  in  Equalize(in)  the  equalize 
block  then  calculates  and  outputs  the  equalized  data  symbols.  The  equalized  data  symbols 
are  then  mapped  to  bits,  which  can  be  done  on  the  fly  since  4-QAM  is  used  and  the  sign  bit 
is  used  as  the  demodulated  bit. 


The  QRD  algorithm  is  used  to  invert  the  channel  matrices.  This  calculation  represents 
the  majority  of  the  computational  load  of  the  MIMO  receiver.  For  this  reason,  the  QRD  time 
multiplexing  arbitrator  is  discussed  in  further  detail.  The  backwards  substitution  process  is 
also  included  in  this  discussion  because  the  two  processes  are  used  to  equalize  the  received 
data  symbols. 

The  parameters  used  for  the  calculation  of  Nqr  are  provided  in  Table  5.1.  After 
downsampling  the  frequency  response  samples  are  output  from  the  FFT  blocks  at  a  rate 
of  Tnyq.  The  QRD  blocks  operate  at  a  faster  clock  speed  with  period  Teak-  The  number  of 
clock  cycles  needed  for  the  QRD  as  a  function  of  Na  are  reported  along  with  the  total  time 
needed  for  one  QRD  block  to  finish  calculation,  Tqr  =  CgRTcak  and  finally. 


Nqr  - 


Tqr 

Tnyq 


(5.4) 


Figure  5.3  starts  with  the  second  FFT(out)  from  Figure  5.2.  The  first  set  of  samples  out 
of  the  FFT  represents  Hj  and  the  second  clock  cycle  provides  H2  for  an  =  2  system  both 
of  these  subcarriers  are  used  for  pilot  tones.  The  QRD  is  not  needed  for  these  matrices.  The 
next  clock  cycle  provides  H3  which  is  the  first  matrix  that  is  provided  to  the  QRD  blocks. 
The  first  of  the  Nqr  =  11  instantiated  QRD  blocks  is  assigned  the  task  of  decomposing  H3. 
The  next  clock  cycle  H4  is  provided.  The  second  QRD  block  is  used  to  decompose  this 
matrix.  This  process  continues  until  H14  arrives.  As  the  H13  is  assigned  to  the  IF^’  QRD 
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Figure  5.3:  Sample  based  resolution  of  the  QRD  and  equalization  processes  in  the  MIMO 
receiver,  for  Nqr  =  11. 


block,  the  first  QRD  block  is  completed  and  H14  can  be  assigned  to  the  first  QRD  block.  At 
this  point,  the  QRD  blocks  finish,  at  worst,  the  clock  cycle  before  the  next  channel  matrix  is 
available.  The  skipping  of  pilot  tone  subcarriers  frees  up  QRD  blocks  but  is  inconsequential 
in  the  total  execution  time  of  the  QRD  calculations,  particularly  if  the  last  subcarrier  is  a 
data  subcarrier. 
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Table  5.1:  Nqr  calculation  parameters 


Para.  Name 

Na  =  2 

Na  =  3 

II 

Tnyq 

200  ns 

200  ns 

200  ns 

Tcalc 

10  ns 

10  ns 

10  ns 

Cqr 

204  cycles 

300  cycles 

400  cycles 

Tqr 

2040  ns 

3000  ns 

4000  ns 

Nqr 

11 

15 

20 

This  same  process  is  used  in  the  backwards  substitution  calculations,  however 
the  backwards  substitution  block  needs  to  await  the  output  of  the  QRD  block  before 
calculations  can  begin.  But  once  again  there  are  Nqr  =  11  Equalize  blocks  instantiated 
to  keep  up  with  the  data  rate  needs.  As  an  equalize  block  completes  the  next  QRD  finishes 
to  provide  inputs  for  the  next  equalize  process  to  begin. 

Figure  5.2  is  representative  of  the  latency  of  a  MIMO-OFDM  receiver  for  any  Na. 
Generally,  as  Na  increases  each  calculation  requires  a  longer  amount  of  time  for  the  inputs 
to  be  provided  which  is  scaled  vertically  in  Figure  5.2  and  Figure  5.3 

5.2.2  Resource  Usage. 

In  this  section,  possible  target  FPGAs  are  discussed.  The  amount  of  resources  each 
FPGA  provides  are  considered  for  the  MIMO  receiver.  These  available  resources  are 
then  compared  to  the  usage  of  the  MIMO  receiver  as  a  function  of  Nqr  and  Na-  The 
number  of  QRD  blocks,  and  subsequently  the  equalization  blocks,  greatly  vary  the  amount 
of  resources  used.  If  Nqr  is  decreased  resources  are  saved  at  the  cost  of  data  rate.  The 
tradeoffs  of  Na,  Nqr  and  data  rate  are  considered  for  the  MIMO  receiver. 

The  FPGAs  considered  are  the  three  largest  currently  provided  by  Xilinx.  The  amount 
of  resources  available  on  the  VX980T,  VXl  140T,  and  VX2000T  are  provided  in  Table  5.2. 
The  VX980T  is  the  target  for  the  implementations  carried  out  in  this  paper. 
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Table  5.2:  Resources  available  on  Xilinx  Virtex-7  FPGAs  [51]. 


EPGA 

Slice  Regs 

Slice  EUTs 

DSP48E1S 

VX980T 

1,224K 

612K 

3,600 

VX1140T 

1,424K 

712K 

3,360 

VX2000T 

2,443K 

1,221.6K 

2,160 

Table  5.3  provides  implementation  results  for  the  nine  blocks  for  Na  =  2.  The  block 
numbers  correspond  to  block  numbers  provided  in  Section  5.1.  For  each  of  the  blocks 
the  number  of  resources  were  calculated.  The  Block  RAM  usage  is  not  reported  since  the 
DSP48Els  and  LUTs  were  found  to  be  the  limiting  factor  in  the  resource  constraints  on  the 
FPGA. 

As  Nqr  is  increased  from  one  the  scalability  is  determined  as  a  function  of  the  number 
of  QRD  blocks  instantiated.  Blocks  7  and  8  scaled  linearly  with  Nqr.  This  is  completed  to 
provide  intuition  for  the  Na  =  3  and  Na  =  4-  case,  where  the  full  real-time  implementation 
does  not  fit  on  the  VX980T  FPGA.  For  =  3  up  to  Nqr  =  5  QRD  and  Equalize 
blocks  were  instantiated.  The  true  resource  counts  for  the  nine  blocks  for  each  value  of 
Nqr  6  {1,  2,  3,  4,  5}  are  then  linearly  extrapolated  to  determine  how  many  resources 
are  needed  for  each  block  for  Na  =  3.  The  same  technique  is  used  for  Na  =  4  where 
3^qr  s  2,  3,  4}. 

In  Table  5.4,  the  estimated  amount  of  resources  used  for  Na  =  3  are  provided. 
Comparing  the  values  in  Table  5.3  and  Table  5.4  in  most  cases  there  is  an  increase  in 
resources  need.  These  blocks  show  a  dependance  on  Na-  However,  Block  1  or  the 
correlation  block  does  not  increase,  this  is  because  of  the  multi-channel  architecture. 
The  same  amount  of  resources  are  used  but  the  logic  is  run  at  Na  times  the  ADC’s 
clock  frequency.  However,  this  architecture  has  a  maximum  operating  frequency.  As  Na 
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increases,  the  limit  is  reached.  Once  this  occurs,  another  instance  of  the  multi-channel  FIR 
filter  would  need  to  be  instantiated. 

There  is  one  block  that  decreases  in  resources  used.  Block  3  or  the  block  that  stores 
the  pilot  samples  and  the  data  symbol  samples.  This  is  because  the  Xilinx  tools  made  use 
of  Block  RAM  saving  Slices  for  use  in  arithmetic  calculations.  This  could  have  been  done 
in  the  Na  =  2  case  but  it  was  not  necessary  since  the  amount  of  slices  was  not  constrained. 

The  use  of  Block  RAM  also  impacts  latency  of  the  calculation  blocks  where  the  dual¬ 
port  architecture,  described  in  Section  5.1.3,  limits  the  number  of  memory  accesses  per 
clock  cycle.  If  LUTs  are  used  to  store  the  values  they  are  all  available  in  a  clock  cycle. 
However,  if  Block  RAM  is  used  the  latency  is  increased  because  more  clock  cycles  are 
needed  to  retrieve  the  data. 

Finally  comparing  Table  5.4  to  Table  5.5  the  amount  of  resources  for  each  block 
increase  with  again  the  exception  of  the  multi-channel  FIR  filter.  However,  Blocks  5  and 
8  reduce  in  registers  and  LUTs  and  utilize  DSP48E1  blocks.  There  is  a  trade  off  between 
DSP48Els  and  slice  logic,  namely,  execution  speed  but  if  one  resource  is  constrained  the 
other  may  alleviate  the  constraint.  Attributes  in  VHDE  control  whether  DSP48E1  blocks 
are  inferred  from  the  design  or  if  they  are  avoided.  Consider  the  three  EPGAs  outlined  in 
Table  5.2.  Even  though  the  VX2000T  is  a  larger  EPGA  in  terms  of  registers  and  EUTs  it  is 
not  in  terms  of  DSP48E1  blocks.  The  amount  of  resources  available  dictates  what  design 
attributes  are  needed. 

To  gain  intuition  on  how  the  MIMO  receiver  design  grows  as  a  function  of  Na 
extrapolation  is  used  once  again  with  the  resource  counts  for  6  {2  3  4}.  The  extrapolation 
is  done  for  each  block  separately;  in  this  case  a  quadratic  fit  is  used  for  most  blocks  since, 
for  example,  the  backwards  substitution  algorithm  is  0{n^^.  Some  problematic  blocks 
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Table  5.3:  Block  utilization  in  MIMO  receiver  for  Na  =  2 


Block 

Slice  Regs 

Slice  FUTs 

DSP48E1S 

1 

1,114 

2,976 

0 

2 

3,527 

2,466 

36 

3 

9,160 

7,618 

0 

4 

6,601 

5,844 

72 

5 

2,578 

3,451 

0 

6 

6,939 

4,930 

72 

7 

158,986 

327,015 

1,584 

8 

19,148 

35,965 

99 

9 

1 

0 

0 

Total 

208,054 

390,265 

1,863 

such  as  equalization,  where  DSP48Els  are  used  instead  of  slices,  makes  the  extrapolation 
an  estimate.  In  this  case  an  increasing  linear  fit  is  used. 

Figure  5.4  shows  the  number  of  LUTs  vs  DSP48Els.  For  values  of  Na  6  {4,  6,  8,  10} 
asterisks  are  used  to  represent  the  estimated  utilization  of  these  real-time  implementation. 
The  number  of  FPGAs  needed  to  satisfy  the  amount  of  resources  needed  by  these 
implementations  are  represented  by  the  lines.  It  is  necessary  to  note  that  in  this  type  of 
implementation,  where  multiple  FPGAs  are  used,  an  added  step  of  optimizing  the  network 
of  FPGAs  is  needed.  It  is  not  a  trivial  task  in  assigning  calculation  blocks  to  specific 
FPGAs. 
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Table  5.4:  Block  utilization  in  MIMO  receiver  for  Na  =  3 


Block 

Slice  Regs 

Slice  LUTs 

DSP48E1S 

1 

1,114 

2,976 

0 

2 

5,233 

3,698 

54 

3 

6,580 

7,058 

0 

4 

14,851 

13,149 

162 

5 

2,773 

3,807 

0 

6 

15,469 

11,093 

162 

7 

501,649 

1,423,901 

4,860 

8 

33,709 

91,209 

135 

9 

1 

0 

0 

Total 

581,378 

1,556,889 

5,373 

With  the  estimates  of  resource  utilization  calculated,  the  number  of  resources  per 
calculation  block  for  each  value  in  Nqr  and  Na  are  available.  Looking  at  Table  5.3,  Table  5.4 
and  Table  5.5  the  block  with  the  largest  resource  usage  is  the  QRD.  The  second  largest  is 
the  equalization  block.  Both  of  these  blocks  are  linked  in  the  way  that  if  Nqr  is  reduced 
the  input  to  the  equalization  block  is  reduced,  requiring  less  matrix- vector  multipliers  and 
backwards  substitution  blocks.  The  number  of  these  blocks  is  directly  related  to  how  much 
time  is  available  for  calculation  before  the  next  OFDM  symbol  is  ready  for  calculation 
(illustrated  in  Figure  5.3).  If  there  were  a  delay  between  OFDM  symbols;  operating 
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Table  5.5:  Block  utilization  in  MIMO  receiver  for  =  4 


Block 

Slice  Regs 

Slice  LUTs 

DSP48E1S 

1 

1,114 

2,976 

0 

2 

6,939 

4,932 

72 

3 

8,848 

9,401 

0 

4 

26,401 

23,360 

288 

5 

2,434 

2,813 

40 

6 

25,705 

18,480 

270 

7 

2,243,379 

81,706,853 

79,113 

8 

51,132 

44,757 

1,921 

9 

1 

0 

0 

Total 

2,366,000 

81,813,527 

82,704 

the  burst  communication  mode,  the  number  of  QRD,  matrix-vector  multiplication  and 
backwards  substitution  blocks  could  be  reduced. 

Figure  5.5  shows  the  total  number  of  resources  used  as  a  function  of  Na-  The  total 
number  of  resources  is  compared  to  the  usage  of  just  the  QRD  block,  Block  7.  The  QRD 
makes  up  about  76-98  percent  of  the  total  design  depending  on  Na  and  the  resource.  As 
Na  increases  the  QRD  increases  faster  than  any  other  block. 

The  delay  between  OFDM  symbols  is  denoted  Tl.  If  is  set  to  zero,  there  is  no  gap 
between  OFDM  symbols.  In  this  case  the  MIMO  receiver  is  real-time  compatible. 
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DSP48E1S  (x10^) 


Figure  5.4:  Extrapolated  slice  LUTs  and  DSP48E1  usage  for  Na  =  [4,  6,  8,  10].  Also 
pictured  are  the  available  resources  for  100,  525,  1050  and  1900  EPGAs. 


If  the  real-time  capability  constraint  were  lifted,  effectively  making  Tl>  0,  the  design 
size  could  be  reduced  at  the  cost  of  the  data  rate  capable  by  the  receiver.  The  data  rate  the 
receiver  is  capable  of  is  given  by: 


R  = 


log2(M)A«(A  -  LNa) 


(5.5) 


iT,N')  +  Tl 

where  M  is  the  constellation  order,  N  -  NaL  is  the  number  of  data  subcarriers,  is  the 
sampling  frequency  at  base  band,  N'  =  N  +  Ncp,  Ncp  is  the  CP  length  and  Tp  is  the  amount 
of  time  between  OEDM  symbols  which  corresponds  to  the  amount  of  additional  time  now 
available  to  the  QRD  and  equalization  blocks  for  calculation. 
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(a)  Slice  Register  Usage 


(b)  Slice  LUT  Usage 


(c)  DSP48E1  Usage 


Figure  5.5:  Total  resource  usage  as  a  function  of  Na  compared  to  the  QRD  block  for  real¬ 
time  compliant  MIMO  receiver. 


Figure  5.6  shows  the  relation  between  Nqr  and  data  rate.  The  values  for  Ti 
were  determined  empirically  by  considering  the  arrival  rate  from  the  FFT  block,  certain 
subcarriers  do  not  need  to  be  demixed  since  they  are  pilot  tones,  and  the  calculation  time 
for  a  QRD  block  as  a  function  of  Na-  These  Ti  values  represent  the  minimum  amount  of 
time  required  between  the  OFDM  symbols  for  the  QRD  and  equalization  chain  to  complete 
in  time  for  the  next  OFDM  block.  Since  the  minimum  T^is  found  the  maximum  data  rate 
is  shown  in  Figure  5.6.  When  Nqr  =  10  the  data  rate  is  increasing  rapidly  then  at  Nqr  =  11 
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Figure  5.6:  As  the  number  of  QRD  blocks  instantiated  increases,  the  data  rate  possible  at 
the  receiver  increases  since  subcarriers  can  be  demixed  at  a  faster  rate.  As  the  number  of 
antennas  in  the  MIMO  system  increase  the  data  rate  also  increases.  For  these  calculations 
M  =  4,  A  =  64,  r,  =  200  ns. 


the  data  rate  is  starting  to  level  out.  At  this  point  the  maximum  data  rate  has  been  acheived 
for  the  system  parameters  the  QRD  is  no  longer  to  limiting  factor  in  the  system.  To  increase 
the  data  rate  at  this  point,  the  constellation  order  or  bandwidth  need  to  be  increased.  Also, 
Na  may  be  increased  but  this  does  not  necessarily  improve  the  data  rate.  For  instance, 
N  =  64,  Na  =  6  and  L  =  5,  this  results  in  N^t  =  34  data  tones.  If  Na  is  increased  to  =  7, 
of  the  A  =  64  newly  added  subcarriers  available  for  data  transmission  only  24  of  them  are 
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assign  data  the  rest  are  used  for  pilots.  Increasing  the  number  of  subcarriers  provides  a 
larger  data  rate  increase  in  this  scenario. 

Figure  5.7  shows  the  number  of  FPGAs  required  to  implement  all  the  of  logic  for  the 
MIMO  receiver  design.  There  is  an  assumption  that  the  FPGAs  can  be  interfaced  with  a  bus 
that  is  capable  of  maintaining  a  data  rate  needed  for  the  FPGAs  to  communicate  effectively. 
The  Low  Voltage  Differential  Signaling  (LVDS)  interface  methods  are  simple  to  implement 
and  provide  a  high  data  rate  [54] . 

The  log  of  the  number  of  FPGAs  are  reported  to  show  trends  at  lower  Nqr  and  for 
a  given  Na  there  is  some  Nqr  where  the  FPGA  that  should  be  targeted  changes  which 
is  clearer  with  the  log  scale.  The  reason  for  the  change  is  that  the  QRD  algorithm  uses 
more  LUTs  as  opposed  to  DSP48Els.  When  Nqr  is  small,  the  DSP48Els  (from  the  EETs, 
etc.)  dictate  which  EPGA  to  target.  As  Nqr  increases,  the  EUTs  introduced  by  the  newly 
instantiated  QRD  blocks  dictate  which  EPGA  to  target. 

5.3  Conclusions 

This  chapter  discusses  a  full  MIMO-OEDM  communications  receiver  where  the 
number  of  antennas  used  at  each  transmitter  and  receiver  is  Na  =  2,  3  and  4.  Each 
component  in  the  receiver  is  described  and  resource  utilization  reports  are  provided.  Trends 
in  resource  usage  were  extrapolated  for  up  to  Na  =  10.  The  number  of  QRD  and 
equalization  blocks  were  varied  to  determine  the  tradeoff  between  data  rate  and  resource 
usage. 

It  is  found  that  for  Na  =  2  the  full  real-time  system  is  implementable  on  the  Xilinx 
VX980T  EPGA.  Eor  Na  =  3  and  Na  =  4-  the  full  real-time  system  could  not  be  implemented 
on  a  single  EPGA  since  the  design  uses  too  many  resources.  The  number  of  QRD  and 
equalize  blocks  were  reduced  (reducing  the  data  rate)  for  the  Na  =  3  and  Na  =  4  design  to 
successfully  fit  on  the  VX980T. 
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Figure  5.7:  As  the  number  of  QRD  block  instantiated  increases,  the  number  of  FPGAs 
needed  to  implement  the  receiver  increases.  As  the  number  of  antennas  in  the  MIMO 
system  increase  the  number  of  FPGAs  needed  to  implement  the  receiver  also  increases. 


With  this  information,  the  relationships  between  data  rate  and  Nqr  and  Na  were 
discovered.  The  number  of  FPGAs  and  Nqr  and  Na  are  also  related  by  the  graphs  provided. 
Finally,  the  decision  for  the  target  FPGA  is  considered  when  a  variable  number  of  Slices 
and  DSP48Els  are  needed. 

The  QRD  has  been  presented  as  the  most  expensive,  in  terms  of  resources  and 
computational  time,  in  the  MIMO  receiver  algorithm.  In  going  forward  with  this  work 
QRD  algorithms  described  in  [55]  offer  a  reduced  area  algorithm  that  may  be  suitable  for 
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the  receiver  described  in  this  paper.  Other  methods  for  matrix  decomposition  may  also 
have  an  advantage  for  this  particular  application  such  as  [56]. 
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VI.  Conclusions 


This  chapter  concludes  the  dissertation  by  reiterating  the  results  from  the  three  thrusts 
of  the  dissertation.  Also  mentioned  are  the  journal  articles  published  as  a  result  of  the 
findings  in  this  dissertation. 

6.1  Covert  MIMO  Communications 

Chapter  III  describes  and  analyzes  a  transmitter  and  receiver  topology  in  which 
IBI  is  leveraged  to  provide  a  physical  layer  security  measure.  The  intended  receiver’s 
performance  is  orders  of  magnitude  better  in  terms  of  BER  compared  to  an  Ex  at  0  = 
45°.  This  result  shows  that  IBI  considered  a  hindering  effect  in  traditional  multicarrier 
communications  can  be  used  to  degrade  unintended  users’  performance. 

Also  in  Chapter  III  the  derivation  of  IBI  in  MIMO  communications  is  presented.  This 
derivation  considers  the  influence  of  IBI  for  a  given  set  of  NrNt  channels.  A  particular 
topology  is  not  assumed  in  the  derivation;  only  the  channel  lengths  and  their  relation  to 
the  CP  length  drives  the  effect  of  IBI.  Whether  the  length  of  the  channel  is  elongated 
by  topological  delays  or  the  environment  itself  the  IBI  and  ultimately  the  SINK  are 
characterized  for  a  general  MIMO  communication  system  for  specific  channels. 

The  task  of  estimating  the  channels  in  a  MIMO  system  is  also  considered.  The  CE 
accuracy  affects  the  performance  of  the  receiver.  Two  CE  algorithms  are  considered,  the 
TDCE  and  EDCE  and  the  performance  of  both  are  compared.  The  EDCE  is  the  method  of 
choice  for  the  cooperative  system.  Eor  Ex  the  TDCE  should  be  leveraged  provided  that  the 
delays  between  users  are  known  or  can  be  estimated. 

6.2  TOA  Estimation  with  TDOA  extension 

As  the  Ex,  the  relative  delays  between  users  are  needed  to  improve  detection 
performance.  Chapter  IV  provides  simulation  and  experimental  results  to  support  the 
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ability  to  estimate  the  TOA.  First,  simulations  are  conducted  of  the  van  de  Beek  and 
Acharya  methods.  The  success  of  both  algorithms  then  lead  into  a  small  scale  hardware 
implementation. 

The  small  scale  hardware  implementation  consists  of  two  WARP  boards.  The  first  of 
these  uses  four  radio  cards  that  represent  four  users.  For  each  of  the  users  the  transmit  signal 
is  delayed.  The  second  WARP  board  utilizes  a  single  receive  radio  card  that  estimates  the 
four  delay  values.  The  van  de  Beek  and  Acharya  methods  are  compared  while  the  van  de 
Beek  method  is  able  to  provide  relative  delays  the  delay  is  not  able  to  be  tied  to  a  specific 
user.  This  is  problematic  for  Ex  because  in  the  equalization  process  the  delay  for  each  user 
is  needed. 

The  Acharya  method  does  however  provide  the  delay  per  user.  This  is  accomplished 
by  leveraging  the  subcarriers  occupied  by  each  user.  In  the  OFDMA  scheme  each  user  is 
allotted  specific  subcarriers.  If  this  is  known  at  Ex  the  delays  can  be  estimated. 

A  natural  extension  of  this  work  is  to  TDOA  positioning.  Determining  TDOA 
measurements  are  also  looked  at  via  a  large  scale  hardware  implementation.  Eor  the  large 
scale  hardware  implementation  to  be  successful  synchronization  between  receivers  needs 
to  be  established.  This  is  successfully  accomplished  by  a  reference  transmitter.  A  hallway 
test  is  completed  to  provide  a  proof  of  concept  for  this  method. 

To  provide  a  more  interesting  testbed  another  limitation  must  be  overcome.  The 
Ethernet  cables  that  link  the  controlling  computer  to  the  WARP  board  receivers  must  be 
removed  to  enlarge  the  testbed.  This  is  accomplished  by  the  Ethernet  bridge  for  future 
students  to  implement  TDOA  based  positioning  algorithms  on. 

6.3  FPGA  Implementation  Scalability  for  MIMO  Receivers 

Chapter  V  implements  a  MIMO  receiver  with  2,  3,  and  4  transmit  and  receive 
antennas.  The  EPGA  resources  are  reported  for  each  of  these  implementations  in  terms 
of  Slice  EUTs,  Slice  Registers  and  DSP48Els  for  Xilinx  EPGAs.  The  resources  used  by 
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each  of  the  nine  components  of  the  receiver  are  extrapolated  for  larger  MIMO  systems.  The 
trends  gathered  from  the  extrapolation  provide  intuition  on  the  number  of  FPGAs  needed 
to  implement  a  MIMO  receiver  for  a  specific  data  rate. 

Data  rate  is  also  considered  as  a  function  of  resource  usage.  The  tradeoff  between 
resources  used  and  the  data  rate  the  receiver  is  capable  of  is  elaborated  on  in  the  form 
of  the  number  of  QRD  blocks  instantiated.  As  the  number  of  QRD  blocks  instantiated 
decreases  the  data  rate  is  reduced,  and  the  receiver  requires  the  OFDM  symbols  to  arrive 
less  frequently. 

6.4  Publications 

The  work  done  in  this  dissertation  lead  to  two  journal  article  submissions  and 
potentially  a  conference  paper.  The  work  described  in  Chapter  III  has  been  submitted 
to  IEEE  Transactions  on  Information  Eorensics  and  Security  on  10  January  2014, 
under  the  title  “Throughput  Preserving  Physical  Layer  Security  Leveraging  Inter-Block 
Interference”.  The  article  is  still  under  review  at  this  time. 

The  second  journal  article  takes  the  form  of  Chapter  V.  The  work  described  is 
submitted  to  IEEE  Transactions  on  Circuits  and  Systems  7  on  5  March  2014  under  the 
title  “FPGA  Resource  Scalability  Study  for  Large-Scale  MIMO-OFDM  Receivers”.  The 
article  is  still  under  review. 

The  implementations  of  the  TDOA  work  in  Chapter  IV  is  being  prepared  for 
submission  to  Asilomar  Conference  on  Signal,  Systems,  and  Computers  with  a  due  date 
of  1  May  2014. 

6.5  Future  Work 

The  topology  analyzed  in  Chapter  III  provides  insight  for  that  particular  military 
application.  Some  other  interesting  topologies  could  include  cellular  networks  where  the 
role  of  the  eavesdropper  is  a  mobile  user  in  an  adjacent  cell.  Natural  extensions  of  this 
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contribution  would  be  to  develop  a  testbed  for  this  work.  An  FPGA  implementation  of  the 
TDCE  would  be  very  complex  for  such  a  platform.  The  added  ability  to  estimate  the  TOA 
values  for  each  user  via  t\\Q  Acharya  method  on  multiple  FPGAs  would  be  very  interesting. 
However,  i\\Q  Acharya  method  is  too  complex  for  a  real-time  implementation.  Determining 
a  less  computationally  complex  estimator  is  beneficial  for  practical  applications. 

Further  work  into  the  calculations  involved  in  channel  equalization  is  needed.  For 
MIMO  receivers  to  grow  in  size  the  ability  to  equalize  faster  is  needed.  The  point  of 
increasing  the  number  of  antennas  is  to  increase  data  rate  but  the  computation  complexity 
of  the  matrix  inversion  and  equalization  are  the  biggest  issues  in  the  design. 
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Appendix:  Multiple  OFDM  Symbol  TOA  Estimator 


Section  5.1  discusses  the  TOA  estimator  for  multiple  OFDM  symbols.  The  PDF  of 
the  received  OFDM  symbol  is: 


where 


and 
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Assuming  each  received  OFDM  symbol  is  independent,  the  log-likelihood  function  to  be 
maximized  is: 
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The  minimization  of  the  negation  of  Equation  (A.  11)  provides  the  estimator  used  in 


Section  5.1: 


6  =  argmin 

e 


iVz,  logical 


(A.12) 
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